oceanmodeling / ufs-weather-model

This repo is forked from ufs-weather-model, and contains the model code and external links needed to build the UFS coastal model executable and model components, including the ROMS, FVCOM, ADCIRC and SCHISM plus WaveWatch III model components.
https://github.com/oceanmodeling/ufs-coastal-app
Other
6 stars 5 forks source link

Automated reg-testing #42

Open janahaddad opened 9 months ago

janahaddad commented 9 months ago

Description

Reg testing is currently done manually

Solution

janahaddad commented 8 months ago

There's a meeting scheduled with EPIC on March 20 to hear their thoughts on this. Let's discuss it after as a team the week of Monday March 25

janahaddad commented 7 months ago

Update from Friday 4/5/2024:

janahaddad commented 7 months ago

Where this stands right now:

janahaddad commented 7 months ago

Moving this to future Milestones backlog for now. We may get to test out a Jenkins sandbox through EPIC but effort to add CI/CD pipelines will come in later Milestones.

janahaddad commented 5 months ago

Friday June 7 Email exchange with Keven @ EPIC :

National Oceanic and Atmospheric Administration Mail - Jenkins sandbox-style testing for UFS Coastal.pdf

janahaddad commented 5 months ago

Notes from EPIC during DevOps training:

janahaddad commented 2 months ago

Update on this topic:

uturuncoglu commented 2 months ago

Content of mail to EPIC: @ankimball Hi All,

I am trying to update one of the Jenkins Pipeline for UFS coastal and the workflow is fading when it is trying to checkout UFS Coastal like following,

git-lsf filter-process line 1: git-lsf: command not found

As I see from my search, the checkout part needs to have [$class: 'GitLFSPull’] but even with it I am still getting error. There are some discussion about installing lsf support in global configuration etc. but I am not sure that is the right approach or not. Do you have any suggestion? I think this is failing since one of the component - SCHSIM - enabled the git lsf feature. Is there any example in other pipelines. I could not see their configuration to get some insight.

Best,

—ufuk

janahaddad commented 2 months ago

Update as of 9/19:

janahaddad commented 1 month ago

Update as of 9/20:

uturuncoglu commented 1 month ago

@janahaddad We need to get some level of support from EPIC team. They did not respond my last couple of mails related with the issues (like disk quota error) that I faced. So, if they provide information to convert existing WF to new account that would be great. Then, I could try the workflow and integrate with the PR process. BTW, we need to clean up unused Jenkins WFs from the NOS folder.

janahaddad commented 1 month ago

@ankimball @kbooker79 can you assist with solving some remaining issues? Here's the list I have of open items from @uturuncoglu :

kbooker79 commented 1 month ago

Hi Jana,

Once we set up your new runner with the role-nosofs account attached that should resolve the disk quota issue. [image: photo]

Kristopher Booker EPIC Team Lead, Tomorrow.io

@.***

9 Channel Center St, 7th Floor, Boston, MA 02210 https://maps.google.com/?q=9+Channel+Center+St,+7th+Floor,+Boston,+MA+02210

[image: linkedin] https://www.linkedin.com/company/tomorrow-io

[image: twitter] https://twitter.com/tomorrowio_

[image: App Banner Image] https://www.tomorrow.io/blog/tomorrow-ios-historic-satellite-launch-paves-way-for-groundbreaking-advancement-in-global-weather-forecasting/

[image: tpx]

On Wed, Sep 25, 2024 at 1:44 PM Jana Haddad @.***> wrote:

@ankimball https://github.com/ankimball @kbooker79 https://github.com/kbooker79 can you assist with solving some remaining issues? Here's the list I have of open items from @uturuncoglu https://github.com/uturuncoglu :

  • Disk quota error: is this solved by setting export ACCNR=role-nosofs ?

hercules-login-1.hpc.msstate.edu Machine: hercules Account: epic rt.sh: Setting up hercules... Linking /work2/noaa/epic/stmp/role-epic/stmp/role-epic/FV3_RT/rt_1489206 to /work/noaa/epic/role-epic/jenkins/workspace/hercules/coastal/ufs-weather-model/nightly-test-build-coastal-wm/tests/run_dir Run regression test in: /work2/noaa/epic/stmp/role-epic/stmp/role-epic/FV3_RT/rt_1489206 rt.sh: Checking & Updating test configuration... No update needed to rt.conf cat: write error: Disk quota exceeded rt.sh finished rt.sh: Cleaning up... rt.sh: Exiting.

— Reply to this email directly, view it on GitHub https://github.com/oceanmodeling/ufs-weather-model/issues/42#issuecomment-2375107947, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGALFIWUGTIOE3A5O72WLJDZYMHANAVCNFSM6AAAAABDMRVYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZVGEYDOOJUG4 . You are receiving this because you were mentioned.Message ID: @.***>

--

CONFIDENTIALITY NOTICE: The information in this email may be confidential and/or privileged. This email is intended to be reviewed by only the individual or organization named above. If you are not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any review, dissemination or copying of this email and its attachments, if any, or the information contained herein is prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system.

janahaddad commented 1 month ago

@kbooker79 thanks, do you have a timeline for setting up the runner on role-nosofs ?

kbooker79 commented 1 month ago

Hi Jana,

If the role-nosofs account is ready, all I need now is to gain shell access to this account and set up the runner. I suppose I'll need to open a ticket with the MSU help desk to do this. [image: photo]

Kristopher Booker EPIC Team Lead, Tomorrow.io

@.***

9 Channel Center St, 7th Floor, Boston, MA 02210 https://maps.google.com/?q=9+Channel+Center+St,+7th+Floor,+Boston,+MA+02210

[image: linkedin] https://www.linkedin.com/company/tomorrow-io

[image: twitter] https://twitter.com/tomorrowio_

[image: App Banner Image] https://www.tomorrow.io/blog/tomorrow-ios-historic-satellite-launch-paves-way-for-groundbreaking-advancement-in-global-weather-forecasting/

[image: tpx]

On Mon, Sep 30, 2024 at 12:11 PM Jana Haddad @.***> wrote:

@kbooker79 https://github.com/kbooker79 thanks, do you have a timeline for setting up the runner on role-nosofs ?

— Reply to this email directly, view it on GitHub https://github.com/oceanmodeling/ufs-weather-model/issues/42#issuecomment-2383861025, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGALFIWDHOOSGV264W7DWT3ZZGH5LAVCNFSM6AAAAABDMRVYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBTHA3DCMBSGU . You are receiving this because you were mentioned.Message ID: @.***>

--

CONFIDENTIALITY NOTICE: The information in this email may be confidential and/or privileged. This email is intended to be reviewed by only the individual or organization named above. If you are not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any review, dissemination or copying of this email and its attachments, if any, or the information contained herein is prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system.

janahaddad commented 1 month ago

@uturuncoglu I just created /work2/noaa/nosofs/UFS-Coastal-Jenkins-RT/ directory for the dprefix path

uturuncoglu commented 1 month ago

@janahaddad Okay. Let me update it. I have still run under /work/noaa/nosofs/role-nosofs/stmp/role-nosofs/FV3_RT/rt_99402/ but workflow is running now and I just want to check the results. Once I done it, I'll let you know and we could remove /work/noaa/nosofs/role-nosofs/stmp one. I don't have permission but maybe you have it.

uturuncoglu commented 1 month ago

@janahaddad I have mkdir: cannot create directory ‘/work2/noaa/nosofs/UFS-Coastal-Jenkins-RT/stmp’: Permission denied error. Could you fix the permissions. I think that director needs to belong to role account. you could do it by issuing chown role-nosofs:nosofs /work2/noaa/nosofs/UFS-Coastal-Jenkins-RT.

janahaddad commented 1 month ago

chown: changing ownership of '/work2/noaa/nosofs/UFS-Coastal-Jenkins-RT': Operation not permitted ... probably because I don't have access to role-nosofs myself, only you and Kris do. I just requested access from Ed.

I issued chmod 770 & chmod g+s , perhaps that works?

Another option is to have the jenkins script create UFS-Coastal-Jenkins-RT ?

uturuncoglu commented 1 month ago

@janahaddad Okay. I run it again and seem working fine now. At least it passed permission issue. BTW, the initial run (the one runs under role account) failed after running all intel tests ans switching to gnu once like following error.

./rt.sh: line 536: /work/noaa/nosofs/role-nosofs/jenkins/workspace/hercules/coastal/ufs-weather-model/nightly-test-build-coastal-wm/tests/lock/PID: No such file or directory

I am not sure what is wrong with it. It does not make sense to me. Maybe we need to run intel and gnu tests separately. I'll try to run them outside of the Jenkins to see fine or not.

uturuncoglu commented 1 month ago

@janahaddad @saeed-moghimi-noaa I tried to run RT individually using Jenkins workflow. Here is the current status,

I think we need to get some support from EPIC to debug those issues.

janahaddad commented 1 month ago

@kbooker79 or @ankimball do you have any thoughts on what may be happening here? In particular note that the atm2roms RTs are passing without issue when Ufuk runs them manually on Hercules. Have you seen this happen for other code bases?

janahaddad commented 1 month ago

Adding a few more details so that @kbooker79 or @ankimball can hopefully take a look.

  • Intel

    • atm2adc - COMPLETES W/O ISSUE but it is marked as failed (not sure what is the issue)

atm2adc is build 217. This seems like an issue on the Jenkins side mis-interpreting the output. I noticed the Jenkins workflow isn't showing the REGRESSION TEST RESULT: & ******Regression Testing Script Completed****** line in console output as it does for the other builds.

  • atm2fvc - FAILED as expected since it is not b2b but workflow thinks that PASSED

atm2fvc is build 222 and 224. Also seems like Jenkins is mis-interpreting output. The console output is showing REGRESSION TEST RESULT: FAILURE but the dashboard is showing a success.

Can't find this build number for atm2roms intel run, @uturuncoglu can you point to this build # ?

  • GNU
    • atm2roms - FAILED wıth same error

Also can't find atm2roms gnu. @uturuncoglu I see Build 221 but it looks like you aborted that one.

It's great though that all five of the ww3 and schism RTs seem to be running fine & showing correct result!

uturuncoglu commented 1 month ago

@janahaddad I am not sure you could still access to that but I am throwing a new one for ROMS. So, the last (226) would be the ROMS one.

uturuncoglu commented 1 month ago

@janahaddad Actually that case is hanging but if you co and check the ESMF logs you are seeing errors on ROMS PETs. I think it would be nice to run the same case with debug options to understand the actually issue. That is the next thing for me. I think that the urgent thing is to solve the issue related with the misinterpretation of the runs (PASSED but marked as FAILED or FAILED but makes as PASSED).

uturuncoglu commented 1 month ago

@janahaddad 226 is finished and failed (it is shown as PASSED). Run directory is in here /work2/noaa/nosofs/UFS-Coastal-Jenkins-RT/stmp/role-nosofs/FV3_RT/rt_2163913/coastal_irene_atm2roms_intel/.

janahaddad commented 1 month ago

@uturuncoglu yep I see that. So for this case also jenkins is mis-interpreting the result. @kbooker79 is there a place were we can tell Jenkins to look for a specific string in the output to interpret PASS or FAIL?

uturuncoglu commented 2 days ago

@janahaddad I think we need to help about Jenkins. Here is my observation. I tried to run roms RT under Jenkins and it is failing with following error,

20241120 143108.709 ERROR            PET23 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:1980 Object Set or SetDefault method not called  - Passing error in return code
20241120 143108.709 ERROR            PET23 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:489 Object Set or SetDefault method not called  - Passing error in return code
20241120 143108.709 ERROR            PET23 UFS.F90:397 Object Set or SetDefault method not called  - Aborting UFS

This looks weird since I could run same test without any issue manually and it passes. Then, I copied the run directory from role account (cp -r /work2/noaa/nosofs/UFS-Coastal-Jenkins-RT/stmp/role-nosofs/FV3_RT/rt_3113600/coastal_irene_atm2roms_intel) to mine and run it under my account without changing anything (incl. executable) and it worked.

So, at this point I am thinking that maybe there is some user level restriction on our role account that prevents to run the case. It could be stack size etc. not sure but definitely something needs to be fixed in role account. Please also tag anyone that could help in EPIC side so we could make progress in Jenkins and start to use regularly.

janahaddad commented 2 days ago

@uturuncoglu thanks, I'll try and elevate this issue to the right folks at EPIC. @kbooker79, @ankimball, or @jkbk2004 not sure if you all have the capacity to help us solve these Jenkins issues, or can point us in the right direction ?

kbooker79 commented 2 days ago

Hi Ufuk,

I'm not very familiar with the error you are seeing. Is there some special shell configuration in your personal account where this works correctly? Perhaps you could share your .bashrc or .profile? [image: photo]

Kristopher Booker EPIC Team Lead, Tomorrow.io

@.***

9 Channel Center St, 7th Floor, Boston, MA 02210 https://maps.google.com/?q=9+Channel+Center+St,+7th+Floor,+Boston,+MA+02210

[image: linkedin] https://www.linkedin.com/company/tomorrow-io

[image: twitter] https://twitter.com/tomorrowio_

[image: App Banner Image] https://www.tomorrow.io/blog/tomorrow-ios-historic-satellite-launch-paves-way-for-groundbreaking-advancement-in-global-weather-forecasting/

[image: tpx]

On Wed, Nov 20, 2024 at 2:00 PM Ufuk Turunçoğlu @.***> wrote:

@janahaddad https://github.com/janahaddad I think we need to help about Jenkins. Here is my observation. I tried to run roms RT under Jenkins and it is failing with following error,

20241120 143108.709 ERROR PET23 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:1980 Object Set or SetDefault method not called - Passing error in return code 20241120 143108.709 ERROR PET23 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:489 Object Set or SetDefault method not called - Passing error in return code 20241120 143108.709 ERROR PET23 UFS.F90:397 Object Set or SetDefault method not called - Aborting UFS

This looks weird since I could run same test without any issue manually and it passes. Then, I copied the run directory from role account (cp -r /work2/noaa/nosofs/UFS-Coastal-Jenkins-RT/stmp/role-nosofs/FV3_RT/rt_3113600/coastal_irene_atm2roms_intel ) to mine and run it under my account without changing anything (incl. executable) and it worked.

So, at this point I am thinking that maybe there is some user level restriction on our role account that prevents to run the case. It could be stack size etc. not sure but definitely something needs to be fixed in role account. Please also tag anyone that could help in EPIC side so we could make progress in Jenkins and start to use regularly.

— Reply to this email directly, view it on GitHub https://github.com/oceanmodeling/ufs-weather-model/issues/42#issuecomment-2489531747, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGALFIWXAOGBDBUYWDV5BPD2BTZ5LAVCNFSM6AAAAABDMRVYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZGUZTCNZUG4 . You are receiving this because you were mentioned.Message ID: @.***>

--

CONFIDENTIALITY NOTICE: The information in this email may be confidential and/or privileged. This email is intended to be reviewed by only the individual or organization named above. If you are not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any review, dissemination or copying of this email and its attachments, if any, or the information contained herein is prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system.

uturuncoglu commented 2 days ago

@kbooker79 Here is my bashrc on Hercules. These RTs also works for others on Hercules. So, I don't think I have something special in there but let me know if you want me try something.

bashrc.txt

kbooker79 commented 1 day ago

Hi Ufuk,

Yes, looking at it, other than loading a few modules, I don't see anything that stands out as special. I don't believe this is really a Jenkins issue per se as more of a shell environment configuration issue on Hercules. I'm not really sure where to go from here as I'm not super familiar with your testing framework. I'm happy to assist though with any troubleshooting assistance you might need. It might be worth opening a ticket with Hercules RDHPCS support as they are more familiar with that platform. Let me know if you need anything additional from me or my team.

Regards,

Kris [image: photo]

Kristopher Booker EPIC Team Lead, Tomorrow.io

@.***

9 Channel Center St, 7th Floor, Boston, MA 02210 https://maps.google.com/?q=9+Channel+Center+St,+7th+Floor,+Boston,+MA+02210

[image: linkedin] https://www.linkedin.com/company/tomorrow-io

[image: twitter] https://twitter.com/tomorrowio_

[image: App Banner Image] https://www.tomorrow.io/blog/tomorrow-ios-historic-satellite-launch-paves-way-for-groundbreaking-advancement-in-global-weather-forecasting/

[image: tpx]

On Wed, Nov 20, 2024 at 7:58 PM Ufuk Turunçoğlu @.***> wrote:

@kbooker79 https://github.com/kbooker79 Here is my bashrc on Hercules. These RTs also works for others on Hercules. So, I don't think I have something special in there but let me know if you want me try something.

bashrc.txt https://github.com/user-attachments/files/17839134/bashrc.txt

— Reply to this email directly, view it on GitHub https://github.com/oceanmodeling/ufs-weather-model/issues/42#issuecomment-2489963878, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGALFIXQSJ2KDUE3ZH5MWPD2BVD4VAVCNFSM6AAAAABDMRVYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZHE3DGOBXHA . You are receiving this because you were mentioned.Message ID: @.***>

--

CONFIDENTIALITY NOTICE: The information in this email may be confidential and/or privileged. This email is intended to be reviewed by only the individual or organization named above. If you are not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any review, dissemination or copying of this email and its attachments, if any, or the information contained herein is prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system.

uturuncoglu commented 1 day ago

@kbooker79 Okay. I'll try to find the source of the issue. BTW, is it possible to see the actual configuration used by regular UFS WM testing (through PR etc.). I could not see the detail of it from my account and I could only modify/see the ones for UFS Coastal. Maybe there is an hidden data that I am missing. Thanks