Closed AnningCheng-NOAA closed 2 weeks ago
@AnningCheng-NOAA I'm running UFS RTs on Hera right now and will upload the test_changes.list file when finished. I'm not sure what happened to the PR template when you edited it, but it looks different than other PRs.
@Grant Firl @.***> are you referring to https://github.com/NOAA-EMC/fv3atm/pull/816?
On Thu, Apr 25, 2024 at 1:55 PM Grant Firl @.***> wrote:
@AnningCheng-NOAA https://github.com/AnningCheng-NOAA I'm running UFS RTs on Hera right now and will upload the test_changes.list file when finished. I'm not sure what happened to the PR template when you edited it, but it looks different than other PRs.
— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/pull/2221#issuecomment-2077844285, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMIMCMB5VBYQCN5W6F73Y7E7R5AVCNFSM6AAAAABFSAGCAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZXHA2DIMRYGU . You are receiving this because you were mentioned.Message ID: @.***>
No, this PR: https://github.com/ufs-community/ufs-weather-model/pull/2221
The description format looks like it is messed up somehow. Also, I fixed .gitmodules in your ufs-weather-model and fv3atm branches (you had that the branches were called innl when they should have been mr2_innl), and the fv3atm PR was pointing to old commits of upp and atmos_cubed_sphere. I think that it should be fixed now.
@grantfirl, I have just updated the PR description, a little bit better.
@grantfirl @AnningCheng-NOAA can you please sync up the branch for the PR?
@grantfirl @AnningCheng-NOAA It was a bit hard to follow what ccpp-framework PR was going into this, but I think i figured it out and linked the correct on in the PR description. Please double check that. What I have not done yet was to make sure that the branch for ccpp-framework PR matched that of the one in the fv3atm PR. Could you please double check that as well?
Also make sure you keep your test_changes.list when bumping the branch/fixing conflicts.
@BrianCurtis-NOAA @AnningCheng-NOAA I can get to this around 4ET today. Please stand by.
@grantfirl @AnningCheng-NOAA It was a bit hard to follow what ccpp-framework PR was going into this, but I think i figured it out and linked the correct on in the PR description. Please double check that. What I have not done yet was to make sure that the branch for ccpp-framework PR matched that of the one in the fv3atm PR. Could you please double check that as well?
Also make sure you keep your test_changes.list when bumping the branch/fixing conflicts.
Sorry about the CCPP Framework PR issue. Since this is Anning's PR, I can't edit the description, but you figured out the correct one: https://github.com/NCAR/ccpp-framework/pull/555
Orion is hanging for me, and I'm unable to run any jobs since last night. I've reached out to MSU to see if there is broader issue.
There seems to be an ecflow connection issue on Jet. Some tests make it through before connection errors begin to occur. I'm messaging Jet admins to get a support ticket opened and looked into. The RT log will have some of the tests that made it through but it will end without the rest of the tests or the test summary.
There seems to be an ecflow connection issue on Jet. Some tests make it through before connection errors begin to occur. I'm messaging Jet admins to get a support ticket opened and looked into. The RT log will have some of the tests that made it through but it will end without the rest of the tests or the test summary.
I have not seen this on WCOSS2/Acorn, so it would lean towards a Jet specific issue. We've made some small ecflow changes necessary to Acorn/WCOSS2 that will be brought in when those RT's complete, but if you wanted to see if those changes help/hinder things (i don't think they will be impactful though) then you can git pull using https://github.com/BrianCurtis-NOAA/ufs-weather-model/tree/ecflow_fixes
There seems to be an ecflow connection issue on Jet. Some tests make it through before connection errors begin to occur. I'm messaging Jet admins to get a support ticket opened and looked into. The RT log will have some of the tests that made it through but it will end without the rest of the tests or the test summary.
I have not seen this on WCOSS2/Acorn, so it would lean towards a Jet specific issue. We've made some small ecflow changes necessary to Acorn/WCOSS2 that will be brought in when those RT's complete, but if you wanted to see if those changes help/hinder things (i don't think they will be impactful though) then you can git pull using https://github.com/BrianCurtis-NOAA/ufs-weather-model/tree/ecflow_fixes
I think this is likely a jet specific issue as well, I didn't run into these connection errors with Hera and Gaea. Typically I only see these errors at the beginning when the script needs to start up ecflow, but these are showing up in the middle of testing:
3994 ECFLOW Tasks Remaining: 159/192
3995 [08:08:04 1.5.2024] ClientInvoker: Connection error: (Client::check_deadline: timed out after 60 seconds for request( --get_state=/regtest
3995 _2304630 ) on ecflow1:22548)
3996 ECFLOW Tasks Remaining: 159/192
3997 ECFLOW Tasks Remaining: 159/192
And on the final occurrence, the remaining tasks seem to just be cancelled:
4764 ECFLOW Tasks Remaining: 15/192
4765 [10:20:10 1.5.2024] ClientInvoker: Connection error: (Client::check_deadline: timed out after 60 seconds for request( --get_state=/regtest
4765 _2304630 ) on ecflow1:22548)
4766 [10:21:20 1.5.2024] ClientInvoker: Connection error: (Client::check_deadline: timed out after 60 seconds for request( --get_state=/regtest 4766 _2304630 ) on ecflow1:22548)
4767 [10:21:20 1.5.2024] Request( --get_state=/regtest_2304630 ), Failed to connect to ecflow1:22548. After 2 attempts. Is the server running ?
4768 ClientEnvironment:
4769 [10:21:20 1.5.2024] Ecflow version(5.11.4) boost(1.83.0) compiler(gcc 13.2.0) protocol(JSON cereal 1.3.0) openssl(enabled) Compiled on Feb 4769 21 2024 15:55:17
4770 ECF_HOST/ECF_PORT : host_vec_index_ = 0 host_vec_.size() = 1
4771 ecflow1:22548
4772 ECF_NAME =
4773 ECF_PASS =
4774 ECF_RID =
4775 ECF_TRYNO = 1
4776 ECF_HOSTFILE = /apps/ecflow/5.11.4/share/ecflow/etc/hostfile
4777 ECF_TIMEOUT = 86400
4778 ECF_ZOMBIE_TIMEOUT = 43200
4779 ECF_CONNECT_TIMEOUT = 0
4780 ECF_DENIED = 0
4781 NO_ECF = 0
4782 ECF_DEBUG_CLIENT = 0
4783
4784
4785 ECFLOW Tasks Remaining: 0/192
4786 rt.sh: Generating Regression Testing Log...
It looks like the sudden log cutoff is from being unable to find the results for the leftover tasks:
10293 grep: /mnt/lfs4/HFIP/h-nems/Fernando.Andrade-maldonado/regression-testing/wm/2221/ufs-weather-model/tests/logs/log_jet/rt_rap_control_inte
10293 l.log: No such file or directory
10294 + GETMEMFROMLOG=
10295 + echo 'rt.sh finished'
10296 rt.sh finished
10297 + cleanup
10298 + echo 'rt.sh: Cleaning up...'
10299 rt.sh: Cleaning up...
I'll retry with rocoto while I wait for a reply from the Jet admins.
Looks like orion is available again, I was able to submit jobs finally.
Testing complete. We can move to merge the ccpp physics and framework sub-prs
@jkbk2004 @zach1221 FV3 submodule updated and .gitmodules reverted.
Commit Queue Requirements:
[X] Commit 'test_changes.list' from previous step
Description:
NWFA Induced evaporation is turn off , but evaporation not related to aerosol is turned on to prevent excessive evaporation when Inner loop and mraerosol=T
Commit Message:
NWFA Induced evaporation is turn off , but evaporation not related to aerosol is turned on to prevent excessive evaporation when Inner loop and mraerosol=T
Priority:
Git Tracking
Sub component Pull Requests:
Changes
Regression Test Changes (Please commit test_changes.list):
Input data Changes:
Library Changes/Upgrades:
RegressionTests_hera.log
Testing Log: