Open janahaddad opened 9 months ago
There's a meeting scheduled with EPIC on March 20 to hear their thoughts on this. Let's discuss it after as a team the week of Monday March 25
Update from Friday 4/5/2024:
Where this stands right now:
Moving this to future Milestones backlog for now. We may get to test out a Jenkins sandbox through EPIC but effort to add CI/CD pipelines will come in later Milestones.
Notes from EPIC during DevOps training:
Update on this topic:
export ACCNR=nos-surge
Content of mail to EPIC: @ankimball Hi All,
I am trying to update one of the Jenkins Pipeline for UFS coastal and the workflow is fading when it is trying to checkout UFS Coastal like following,
git-lsf filter-process line 1: git-lsf: command not found
As I see from my search, the checkout part needs to have [$class: 'GitLFSPull’] but even with it I am still getting error. There are some discussion about installing lsf support in global configuration etc. but I am not sure that is the right approach or not. Do you have any suggestion? I think this is failing since one of the component - SCHSIM - enabled the git lsf feature. Is there any example in other pipelines. I could not see their configuration to get some insight.
Best,
—ufuk
Update as of 9/19:
Update as of 9/20:
role-nosofs
and added @uturuncoglu and Kris Booker as members @janahaddad We need to get some level of support from EPIC team. They did not respond my last couple of mails related with the issues (like disk quota error) that I faced. So, if they provide information to convert existing WF to new account that would be great. Then, I could try the workflow and integrate with the PR process. BTW, we need to clean up unused Jenkins WFs from the NOS folder.
@ankimball @kbooker79 can you assist with solving some remaining issues? Here's the list I have of open items from @uturuncoglu :
export ACCNR=role-nosofs
?
[hercules-login-1.hpc.msstate.edu](http://hercules-login-1.hpc.msstate.edu/)
Machine: hercules
Account: epic
rt.sh: Setting up hercules...
Linking /work2/noaa/epic/stmp/role-epic/stmp/role-epic/FV3_RT/rt_1489206 to /work/noaa/epic/role-epic/jenkins/workspace/hercules/coastal/ufs-weather-model/nightly-test-build-coastal-wm/tests/run_dir
Run regression test in: /work2/noaa/epic/stmp/role-epic/stmp/role-epic/FV3_RT/rt_1489206
rt.sh: Checking & Updating test configuration...
No update needed to rt.conf
cat: write error: Disk quota exceeded
rt.sh finished
rt.sh: Cleaning up...
rt.sh: Exiting.
Hi Jana,
Once we set up your new runner with the role-nosofs account attached that should resolve the disk quota issue. [image: photo]
Kristopher Booker EPIC Team Lead, Tomorrow.io
@.***
9 Channel Center St, 7th Floor, Boston, MA 02210 https://maps.google.com/?q=9+Channel+Center+St,+7th+Floor,+Boston,+MA+02210
[image: linkedin] https://www.linkedin.com/company/tomorrow-io
[image: twitter] https://twitter.com/tomorrowio_
[image: App Banner Image] https://www.tomorrow.io/blog/tomorrow-ios-historic-satellite-launch-paves-way-for-groundbreaking-advancement-in-global-weather-forecasting/
[image: tpx]
On Wed, Sep 25, 2024 at 1:44 PM Jana Haddad @.***> wrote:
@ankimball https://github.com/ankimball @kbooker79 https://github.com/kbooker79 can you assist with solving some remaining issues? Here's the list I have of open items from @uturuncoglu https://github.com/uturuncoglu :
- Disk quota error: is this solved by setting export ACCNR=role-nosofs ?
hercules-login-1.hpc.msstate.edu Machine: hercules Account: epic rt.sh: Setting up hercules... Linking /work2/noaa/epic/stmp/role-epic/stmp/role-epic/FV3_RT/rt_1489206 to /work/noaa/epic/role-epic/jenkins/workspace/hercules/coastal/ufs-weather-model/nightly-test-build-coastal-wm/tests/run_dir Run regression test in: /work2/noaa/epic/stmp/role-epic/stmp/role-epic/FV3_RT/rt_1489206 rt.sh: Checking & Updating test configuration... No update needed to rt.conf cat: write error: Disk quota exceeded rt.sh finished rt.sh: Cleaning up... rt.sh: Exiting.
Push test artifacts to cloud or GitHub
@uturuncoglu https://github.com/uturuncoglu this is a question/request for adding the coastal artifacts to https://noaa-epic-dashboard.s3.amazonaws.com/index.html ?
Issue with Status view on Jenkins showing incorrect results. E.g. pipeline shows green even when some RTs fail.
guidance on how to set this up to test PRs. What is needed beyond adding labels to PRs?
Cleaning up https://jenkins.epic.oarcloud.noaa.gov/job/OAR/
@uturuncoglu https://github.com/uturuncoglu we only need "Coastal-Test-WM-under-development" pipeline right? I disabled most of the others but I am not able to delete them. @ankimball https://github.com/ankimball perhaps you can help us clean up this folder?
— Reply to this email directly, view it on GitHub https://github.com/oceanmodeling/ufs-weather-model/issues/42#issuecomment-2375107947, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGALFIWUGTIOE3A5O72WLJDZYMHANAVCNFSM6AAAAABDMRVYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZVGEYDOOJUG4 . You are receiving this because you were mentioned.Message ID: @.***>
--
CONFIDENTIALITY NOTICE: The information in this email may be confidential and/or privileged. This email is intended to be reviewed by only the individual or organization named above. If you are not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any review, dissemination or copying of this email and its attachments, if any, or the information contained herein is prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system.
@kbooker79 thanks, do you have a timeline for setting up the runner on role-nosofs ?
Hi Jana,
If the role-nosofs account is ready, all I need now is to gain shell access to this account and set up the runner. I suppose I'll need to open a ticket with the MSU help desk to do this. [image: photo]
Kristopher Booker EPIC Team Lead, Tomorrow.io
@.***
9 Channel Center St, 7th Floor, Boston, MA 02210 https://maps.google.com/?q=9+Channel+Center+St,+7th+Floor,+Boston,+MA+02210
[image: linkedin] https://www.linkedin.com/company/tomorrow-io
[image: twitter] https://twitter.com/tomorrowio_
[image: App Banner Image] https://www.tomorrow.io/blog/tomorrow-ios-historic-satellite-launch-paves-way-for-groundbreaking-advancement-in-global-weather-forecasting/
[image: tpx]
On Mon, Sep 30, 2024 at 12:11 PM Jana Haddad @.***> wrote:
@kbooker79 https://github.com/kbooker79 thanks, do you have a timeline for setting up the runner on role-nosofs ?
— Reply to this email directly, view it on GitHub https://github.com/oceanmodeling/ufs-weather-model/issues/42#issuecomment-2383861025, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGALFIWDHOOSGV264W7DWT3ZZGH5LAVCNFSM6AAAAABDMRVYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBTHA3DCMBSGU . You are receiving this because you were mentioned.Message ID: @.***>
--
CONFIDENTIALITY NOTICE: The information in this email may be confidential and/or privileged. This email is intended to be reviewed by only the individual or organization named above. If you are not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any review, dissemination or copying of this email and its attachments, if any, or the information contained herein is prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system.
@uturuncoglu I just created /work2/noaa/nosofs/UFS-Coastal-Jenkins-RT/
directory for the dprefix path
@janahaddad Okay. Let me update it. I have still run under /work/noaa/nosofs/role-nosofs/stmp/role-nosofs/FV3_RT/rt_99402/ but workflow is running now and I just want to check the results. Once I done it, I'll let you know and we could remove /work/noaa/nosofs/role-nosofs/stmp one. I don't have permission but maybe you have it.
@janahaddad I have mkdir: cannot create directory ‘/work2/noaa/nosofs/UFS-Coastal-Jenkins-RT/stmp’: Permission denied
error. Could you fix the permissions. I think that director needs to belong to role account. you could do it by issuing chown role-nosofs:nosofs /work2/noaa/nosofs/UFS-Coastal-Jenkins-RT.
chown: changing ownership of '/work2/noaa/nosofs/UFS-Coastal-Jenkins-RT': Operation not permitted
...
probably because I don't have access to role-nosofs myself, only you and Kris do. I just requested access from Ed.
I issued chmod 770
& chmod g+s
, perhaps that works?
Another option is to have the jenkins script create UFS-Coastal-Jenkins-RT ?
@janahaddad Okay. I run it again and seem working fine now. At least it passed permission issue. BTW, the initial run (the one runs under role account) failed after running all intel tests ans switching to gnu once like following error.
./rt.sh: line 536: /work/noaa/nosofs/role-nosofs/jenkins/workspace/hercules/coastal/ufs-weather-model/nightly-test-build-coastal-wm/tests/lock/PID: No such file or directory
I am not sure what is wrong with it. It does not make sense to me. Maybe we need to run intel and gnu tests separately. I'll try to run them outside of the Jenkins to see fine or not.
@janahaddad @saeed-moghimi-noaa I tried to run RT individually using Jenkins workflow. Here is the current status,
I think we need to get some support from EPIC to debug those issues.
@kbooker79 or @ankimball do you have any thoughts on what may be happening here? In particular note that the atm2roms RTs are passing without issue when Ufuk runs them manually on Hercules. Have you seen this happen for other code bases?
Adding a few more details so that @kbooker79 or @ankimball can hopefully take a look.
Intel
- atm2adc - COMPLETES W/O ISSUE but it is marked as failed (not sure what is the issue)
atm2adc is build 217. This seems like an issue on the Jenkins side mis-interpreting the output. I noticed the Jenkins workflow isn't showing the REGRESSION TEST RESULT:
& ******Regression Testing Script Completed******
line in console output as it does for the other builds.
- atm2fvc - FAILED as expected since it is not b2b but workflow thinks that PASSED
atm2fvc is build 222 and 224. Also seems like Jenkins is mis-interpreting output. The console output is showing REGRESSION TEST RESULT: FAILURE
but the dashboard is showing a success.
- atm2roms - gives ERROR (this pass without any issue if I run manually on same platform) coastal_irene_atm2roms_intel test is failing under Jenkins workflow roms#7
Can't find this build number for atm2roms intel run, @uturuncoglu can you point to this build # ?
- GNU
- atm2roms - FAILED wıth same error
Also can't find atm2roms gnu. @uturuncoglu I see Build 221 but it looks like you aborted that one.
It's great though that all five of the ww3 and schism RTs seem to be running fine & showing correct result!
@janahaddad I am not sure you could still access to that but I am throwing a new one for ROMS. So, the last (226) would be the ROMS one.
@janahaddad Actually that case is hanging but if you co and check the ESMF logs you are seeing errors on ROMS PETs. I think it would be nice to run the same case with debug options to understand the actually issue. That is the next thing for me. I think that the urgent thing is to solve the issue related with the misinterpretation of the runs (PASSED but marked as FAILED or FAILED but makes as PASSED).
@janahaddad 226 is finished and failed (it is shown as PASSED). Run directory is in here /work2/noaa/nosofs/UFS-Coastal-Jenkins-RT/stmp/role-nosofs/FV3_RT/rt_2163913/coastal_irene_atm2roms_intel/
.
@uturuncoglu yep I see that. So for this case also jenkins is mis-interpreting the result. @kbooker79 is there a place were we can tell Jenkins to look for a specific string in the output to interpret PASS or FAIL?
@janahaddad I think we need to help about Jenkins. Here is my observation. I tried to run roms RT under Jenkins and it is failing with following error,
20241120 143108.709 ERROR PET23 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:1980 Object Set or SetDefault method not called - Passing error in return code
20241120 143108.709 ERROR PET23 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:489 Object Set or SetDefault method not called - Passing error in return code
20241120 143108.709 ERROR PET23 UFS.F90:397 Object Set or SetDefault method not called - Aborting UFS
This looks weird since I could run same test without any issue manually and it passes. Then, I copied the run directory from role account (cp -r /work2/noaa/nosofs/UFS-Coastal-Jenkins-RT/stmp/role-nosofs/FV3_RT/rt_3113600/coastal_irene_atm2roms_intel
) to mine and run it under my account without changing anything (incl. executable) and it worked.
So, at this point I am thinking that maybe there is some user level restriction on our role account that prevents to run the case. It could be stack size etc. not sure but definitely something needs to be fixed in role account. Please also tag anyone that could help in EPIC side so we could make progress in Jenkins and start to use regularly.
@uturuncoglu thanks, I'll try and elevate this issue to the right folks at EPIC. @kbooker79, @ankimball, or @jkbk2004 not sure if you all have the capacity to help us solve these Jenkins issues, or can point us in the right direction ?
Hi Ufuk,
I'm not very familiar with the error you are seeing. Is there some special shell configuration in your personal account where this works correctly? Perhaps you could share your .bashrc or .profile? [image: photo]
Kristopher Booker EPIC Team Lead, Tomorrow.io
@.***
9 Channel Center St, 7th Floor, Boston, MA 02210 https://maps.google.com/?q=9+Channel+Center+St,+7th+Floor,+Boston,+MA+02210
[image: linkedin] https://www.linkedin.com/company/tomorrow-io
[image: twitter] https://twitter.com/tomorrowio_
[image: App Banner Image] https://www.tomorrow.io/blog/tomorrow-ios-historic-satellite-launch-paves-way-for-groundbreaking-advancement-in-global-weather-forecasting/
[image: tpx]
On Wed, Nov 20, 2024 at 2:00 PM Ufuk Turunçoğlu @.***> wrote:
@janahaddad https://github.com/janahaddad I think we need to help about Jenkins. Here is my observation. I tried to run roms RT under Jenkins and it is failing with following error,
20241120 143108.709 ERROR PET23 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:1980 Object Set or SetDefault method not called - Passing error in return code 20241120 143108.709 ERROR PET23 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:489 Object Set or SetDefault method not called - Passing error in return code 20241120 143108.709 ERROR PET23 UFS.F90:397 Object Set or SetDefault method not called - Aborting UFS
This looks weird since I could run same test without any issue manually and it passes. Then, I copied the run directory from role account (cp -r /work2/noaa/nosofs/UFS-Coastal-Jenkins-RT/stmp/role-nosofs/FV3_RT/rt_3113600/coastal_irene_atm2roms_intel ) to mine and run it under my account without changing anything (incl. executable) and it worked.
So, at this point I am thinking that maybe there is some user level restriction on our role account that prevents to run the case. It could be stack size etc. not sure but definitely something needs to be fixed in role account. Please also tag anyone that could help in EPIC side so we could make progress in Jenkins and start to use regularly.
— Reply to this email directly, view it on GitHub https://github.com/oceanmodeling/ufs-weather-model/issues/42#issuecomment-2489531747, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGALFIWXAOGBDBUYWDV5BPD2BTZ5LAVCNFSM6AAAAABDMRVYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZGUZTCNZUG4 . You are receiving this because you were mentioned.Message ID: @.***>
--
CONFIDENTIALITY NOTICE: The information in this email may be confidential and/or privileged. This email is intended to be reviewed by only the individual or organization named above. If you are not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any review, dissemination or copying of this email and its attachments, if any, or the information contained herein is prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system.
@kbooker79 Here is my bashrc on Hercules. These RTs also works for others on Hercules. So, I don't think I have something special in there but let me know if you want me try something.
Hi Ufuk,
Yes, looking at it, other than loading a few modules, I don't see anything that stands out as special. I don't believe this is really a Jenkins issue per se as more of a shell environment configuration issue on Hercules. I'm not really sure where to go from here as I'm not super familiar with your testing framework. I'm happy to assist though with any troubleshooting assistance you might need. It might be worth opening a ticket with Hercules RDHPCS support as they are more familiar with that platform. Let me know if you need anything additional from me or my team.
Regards,
Kris [image: photo]
Kristopher Booker EPIC Team Lead, Tomorrow.io
@.***
9 Channel Center St, 7th Floor, Boston, MA 02210 https://maps.google.com/?q=9+Channel+Center+St,+7th+Floor,+Boston,+MA+02210
[image: linkedin] https://www.linkedin.com/company/tomorrow-io
[image: twitter] https://twitter.com/tomorrowio_
[image: App Banner Image] https://www.tomorrow.io/blog/tomorrow-ios-historic-satellite-launch-paves-way-for-groundbreaking-advancement-in-global-weather-forecasting/
[image: tpx]
On Wed, Nov 20, 2024 at 7:58 PM Ufuk Turunçoğlu @.***> wrote:
@kbooker79 https://github.com/kbooker79 Here is my bashrc on Hercules. These RTs also works for others on Hercules. So, I don't think I have something special in there but let me know if you want me try something.
bashrc.txt https://github.com/user-attachments/files/17839134/bashrc.txt
— Reply to this email directly, view it on GitHub https://github.com/oceanmodeling/ufs-weather-model/issues/42#issuecomment-2489963878, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGALFIXQSJ2KDUE3ZH5MWPD2BVD4VAVCNFSM6AAAAABDMRVYX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZHE3DGOBXHA . You are receiving this because you were mentioned.Message ID: @.***>
--
CONFIDENTIALITY NOTICE: The information in this email may be confidential and/or privileged. This email is intended to be reviewed by only the individual or organization named above. If you are not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any review, dissemination or copying of this email and its attachments, if any, or the information contained herein is prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system.
@kbooker79 Okay. I'll try to find the source of the issue. BTW, is it possible to see the actual configuration used by regular UFS WM testing (through PR etc.). I could not see the detail of it from my account and I could only modify/see the ones for UFS Coastal. Maybe there is an hidden data that I am missing. Thanks
Description
Reg testing is currently done manually
Solution