ufs-community / ufs-weather-model

UFS Weather Model
Other
129 stars 238 forks source link

fixed excessive evaporation when both innerloop and mraerosol=T #2221

Closed AnningCheng-NOAA closed 2 weeks ago

AnningCheng-NOAA commented 1 month ago

Commit Queue Requirements:

NWFA Induced evaporation is turn off , but evaporation not related to aerosol is turned on to prevent excessive evaporation when Inner loop and mraerosol=T

Commit Message:

NWFA Induced evaporation is turn off , but evaporation not related to aerosol is turned on to prevent excessive evaporation when Inner loop and mraerosol=T

Priority:

Git Tracking

Sub component Pull Requests:


Changes

Regression Test Changes (Please commit test_changes.list):

Input data Changes:

Library Changes/Upgrades:


RegressionTests_hera.log

Testing Log:

grantfirl commented 3 weeks ago

@AnningCheng-NOAA I'm running UFS RTs on Hera right now and will upload the test_changes.list file when finished. I'm not sure what happened to the PR template when you edited it, but it looks different than other PRs.

AnningCheng-NOAA commented 3 weeks ago

@Grant Firl @.***> are you referring to https://github.com/NOAA-EMC/fv3atm/pull/816?

On Thu, Apr 25, 2024 at 1:55 PM Grant Firl @.***> wrote:

@AnningCheng-NOAA https://github.com/AnningCheng-NOAA I'm running UFS RTs on Hera right now and will upload the test_changes.list file when finished. I'm not sure what happened to the PR template when you edited it, but it looks different than other PRs.

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/pull/2221#issuecomment-2077844285, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMIMCMB5VBYQCN5W6F73Y7E7R5AVCNFSM6AAAAABFSAGCAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZXHA2DIMRYGU . You are receiving this because you were mentioned.Message ID: @.***>

grantfirl commented 3 weeks ago

No, this PR: https://github.com/ufs-community/ufs-weather-model/pull/2221

The description format looks like it is messed up somehow. Also, I fixed .gitmodules in your ufs-weather-model and fv3atm branches (you had that the branches were called innl when they should have been mr2_innl), and the fv3atm PR was pointing to old commits of upp and atmos_cubed_sphere. I think that it should be fixed now.

AnningCheng-NOAA commented 3 weeks ago

@grantfirl, I have just updated the PR description, a little bit better.

zach1221 commented 2 weeks ago

@grantfirl @AnningCheng-NOAA can you please sync up the branch for the PR?

BrianCurtis-NOAA commented 2 weeks ago

@grantfirl @AnningCheng-NOAA It was a bit hard to follow what ccpp-framework PR was going into this, but I think i figured it out and linked the correct on in the PR description. Please double check that. What I have not done yet was to make sure that the branch for ccpp-framework PR matched that of the one in the fv3atm PR. Could you please double check that as well?

Also make sure you keep your test_changes.list when bumping the branch/fixing conflicts.

grantfirl commented 2 weeks ago

@BrianCurtis-NOAA @AnningCheng-NOAA I can get to this around 4ET today. Please stand by.

grantfirl commented 2 weeks ago

@grantfirl @AnningCheng-NOAA It was a bit hard to follow what ccpp-framework PR was going into this, but I think i figured it out and linked the correct on in the PR description. Please double check that. What I have not done yet was to make sure that the branch for ccpp-framework PR matched that of the one in the fv3atm PR. Could you please double check that as well?

Also make sure you keep your test_changes.list when bumping the branch/fixing conflicts.

Sorry about the CCPP Framework PR issue. Since this is Anning's PR, I can't edit the description, but you figured out the correct one: https://github.com/NCAR/ccpp-framework/pull/555

zach1221 commented 2 weeks ago

Orion is hanging for me, and I'm unable to run any jobs since last night. I've reached out to MSU to see if there is broader issue.

FernandoAndrade-NOAA commented 2 weeks ago

There seems to be an ecflow connection issue on Jet. Some tests make it through before connection errors begin to occur. I'm messaging Jet admins to get a support ticket opened and looked into. The RT log will have some of the tests that made it through but it will end without the rest of the tests or the test summary.

BrianCurtis-NOAA commented 2 weeks ago

There seems to be an ecflow connection issue on Jet. Some tests make it through before connection errors begin to occur. I'm messaging Jet admins to get a support ticket opened and looked into. The RT log will have some of the tests that made it through but it will end without the rest of the tests or the test summary.

I have not seen this on WCOSS2/Acorn, so it would lean towards a Jet specific issue. We've made some small ecflow changes necessary to Acorn/WCOSS2 that will be brought in when those RT's complete, but if you wanted to see if those changes help/hinder things (i don't think they will be impactful though) then you can git pull using https://github.com/BrianCurtis-NOAA/ufs-weather-model/tree/ecflow_fixes

FernandoAndrade-NOAA commented 2 weeks ago

There seems to be an ecflow connection issue on Jet. Some tests make it through before connection errors begin to occur. I'm messaging Jet admins to get a support ticket opened and looked into. The RT log will have some of the tests that made it through but it will end without the rest of the tests or the test summary.

I have not seen this on WCOSS2/Acorn, so it would lean towards a Jet specific issue. We've made some small ecflow changes necessary to Acorn/WCOSS2 that will be brought in when those RT's complete, but if you wanted to see if those changes help/hinder things (i don't think they will be impactful though) then you can git pull using https://github.com/BrianCurtis-NOAA/ufs-weather-model/tree/ecflow_fixes

I think this is likely a jet specific issue as well, I didn't run into these connection errors with Hera and Gaea. Typically I only see these errors at the beginning when the script needs to start up ecflow, but these are showing up in the middle of testing:

   3994 ECFLOW Tasks Remaining: 159/192
   3995 [08:08:04 1.5.2024] ClientInvoker: Connection error: (Client::check_deadline: timed out after 60 seconds for request( --get_state=/regtest
   3995 _2304630 ) on ecflow1:22548)
   3996 ECFLOW Tasks Remaining: 159/192
   3997 ECFLOW Tasks Remaining: 159/192

And on the final occurrence, the remaining tasks seem to just be cancelled:

   4764 ECFLOW Tasks Remaining: 15/192
   4765 [10:20:10 1.5.2024] ClientInvoker: Connection error: (Client::check_deadline: timed out after 60 seconds for request( --get_state=/regtest
   4765 _2304630 ) on ecflow1:22548)
   4766 [10:21:20 1.5.2024] ClientInvoker: Connection error: (Client::check_deadline: timed out after 60 seconds for request( --get_state=/regtest   4766 _2304630 ) on ecflow1:22548)
   4767 [10:21:20 1.5.2024] Request( --get_state=/regtest_2304630 ), Failed to connect to ecflow1:22548. After 2 attempts. Is the server running ?
   4768 ClientEnvironment:
   4769 [10:21:20 1.5.2024] Ecflow version(5.11.4) boost(1.83.0) compiler(gcc 13.2.0) protocol(JSON cereal 1.3.0) openssl(enabled) Compiled on Feb   4769  21 2024 15:55:17
   4770    ECF_HOST/ECF_PORT : host_vec_index_ = 0 host_vec_.size() = 1
   4771    ecflow1:22548
   4772    ECF_NAME =
   4773    ECF_PASS =
   4774    ECF_RID =
   4775    ECF_TRYNO = 1
   4776    ECF_HOSTFILE = /apps/ecflow/5.11.4/share/ecflow/etc/hostfile
   4777    ECF_TIMEOUT = 86400
   4778    ECF_ZOMBIE_TIMEOUT = 43200
   4779    ECF_CONNECT_TIMEOUT = 0
   4780    ECF_DENIED = 0
   4781    NO_ECF = 0
   4782    ECF_DEBUG_CLIENT = 0
   4783
   4784
   4785 ECFLOW Tasks Remaining: 0/192
   4786 rt.sh: Generating Regression Testing Log...

It looks like the sudden log cutoff is from being unable to find the results for the leftover tasks:

10293 grep: /mnt/lfs4/HFIP/h-nems/Fernando.Andrade-maldonado/regression-testing/wm/2221/ufs-weather-model/tests/logs/log_jet/rt_rap_control_inte
  10293 l.log: No such file or directory
  10294 + GETMEMFROMLOG=
  10295 + echo 'rt.sh finished'
  10296 rt.sh finished
  10297 + cleanup
  10298 + echo 'rt.sh: Cleaning up...'
  10299 rt.sh: Cleaning up...
FernandoAndrade-NOAA commented 2 weeks ago

I'll retry with rocoto while I wait for a reply from the Jet admins.

zach1221 commented 2 weeks ago

Looks like orion is available again, I was able to submit jobs finally.

zach1221 commented 2 weeks ago

Testing complete. We can move to merge the ccpp physics and framework sub-prs

grantfirl commented 2 weeks ago

@jkbk2004 @zach1221 FV3 submodule updated and .gitmodules reverted.