ufs-community / ufs-weather-model

UFS Weather Model
Other
136 stars 244 forks source link

Regression tests using WW3=Y are not compiling on Hera #189

Closed grantfirl closed 3 years ago

grantfirl commented 4 years ago

Description

Upon cloning the ufs-weather-model develop branch recursively and running the standard regression tests (rt.conf) on Hera, the model is failing to compile with WW3=Y. All other tests seem to work fine.

To Reproduce:

Machine: Hera/Intel setup Code: straight develop branch (and associated submodule hashes) of ufs-weather-model (commit e96bc9)

junwang-noaa commented 4 years ago

I checked out the ufs-weather-model develop branch, revision e96bc9cb, the full RT ran successfully on hera with intel compiler.

/scratch1/NCEPDEV/nems/Jun.Wang/nems/vlab/20200820/ufs-weather-model/tests

grantfirl commented 4 years ago

That's strange. At least 2 of us in the DTC have come across the same error. I've attached the compilation log showing an error while making ww3_grib.

My commands were: git clone --recursive -b develop https://github.com/ufs-community/ufs-weather-model ufs-weather-model-test cd ufs-weather-model-test/tests vi rt.sh #(edited PTMP and STMP on hera to point to where I have write privileges) export ACCNR=gmtb #(charge the account where I have access) ./rt.sh -n fv3_ccpp_gfdlmprad 2>&1 | tee rt_ww3_test.log #(try to run a single test that uses WW3=Y in rt.conf)

@junwang-noaa You said that you checked out the develop branch and the e96bc9 commit of ufs-weather-model. Was it a fresh clone of the branch? What was the commit hash of WW3? Mine is pointing to 19f312, which is 4 months old and well behind WW3's develop branch.

compile_1.log rt_ww3_test.log

junwang-noaa commented 4 years ago

Yes, it is a fresh clone from the develop branch in ufs-weather-model repo. WW3 is 19f312. I'm not sure about the -n option in ufs-weather-model, can you remove all the other tests but ww3 test in rt.conf, and do: rt.sh -l rt.conf to see this the test can run through.

On Fri, Aug 21, 2020 at 1:06 PM grantfirl notifications@github.com wrote:

That's strange. At least 2 of us in the DTC have come across the same error. I've attached the compilation log showing an error while making ww3_grib.

My commands were: git clone --recursive -b develop https://github.com/ufs-community/ufs-weather-model ufs-weather-model-test cd ufs-weather-model-test/tests vi rt.sh #(edited PTMP and STMP on hera to point to where I have write privileges) export ACCNR=gmtb #(charge the account where I have access) ./rt.sh -n fv3_ccpp_gfdlmprad 2>&1 | tee rt_ww3_test.log #(try to run a single test that uses WW3=Y in rt.conf)

@junwang-noaa https://github.com/junwang-noaa You said that you checked out the develop branch and the e96bc9 commit of ufs-weather-model. Was it a fresh clone of the branch? What was the commit hash of WW3? Mine is pointing to 19f312, which is 4 months old and well behind WW3's develop branch.

compile_1.log https://github.com/ufs-community/ufs-weather-model/files/5110131/compile_1.log rt_ww3_test.log https://github.com/ufs-community/ufs-weather-model/files/5110132/rt_ww3_test.log

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/issues/189#issuecomment-678394552, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TLHFTHT4UA43H2EYGTSB2SQXANCNFSM4QGTJKGA .

grantfirl commented 4 years ago

Interestingly, using ecflow, it will compile: ./rt.sh -e -n fv3_ccpp_gfdlmprad 2>&1 | tee rt_ww3_test.log

compile_1_ecflow.log

If all regression tests are supposed to go through ecflow, then I guess there's no problem. Should one always use rt.sh with the -e flag?

grantfirl commented 4 years ago

Yes, @junwang-noaa all tests run through fine if the WW3=Y tests are commented out in rt.conf. I tried that a while ago.

junwang-noaa commented 4 years ago

I think rt.sh -n works in s2s, but I haven't tested it in ufs-weather for a while.

On Fri, Aug 21, 2020 at 1:49 PM grantfirl notifications@github.com wrote:

Yes, @junwang-noaa https://github.com/junwang-noaa the test runs through fine if the WW3=Y tests are commented out in rt.conf. I tried that a while ago.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/issues/189#issuecomment-678412004, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TIVAPQTNYW6B3G4CITSB2XQ5ANCNFSM4QGTJKGA .

grantfirl commented 4 years ago

The -n option seems to work fine for ufs-weather-model for other cases. I don't think that this is the issue though, since using rt.sh -l rt.conf with the WW3=Y tests commented out works fine. IMO, I think the fact that it compiles WITH ecflow but not WITHOUT ecflow might be a clue. I don't know why that should matter, but it apparently does.

junwang-noaa commented 4 years ago

Sorry, maybe I am not sure, can you run rt.sh -l rt.conf with rt.conf only containing the ww3 tests?

On Fri, Aug 21, 2020 at 1:54 PM grantfirl notifications@github.com wrote:

The -n option seems to work fine for ufs-weather-model for other cases. I don't think that this is the issue though, since using rt.sh -l rt.conf with the WW3=Y tests commented out works fine. IMO, I think the fact that it compiles WITH ecflow but not WITHOUT ecflow might be a clue. I don't know why that should matter, but it apparently does.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/issues/189#issuecomment-678414240, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TLSDMB35V3T6F337BLSB2YE3ANCNFSM4QGTJKGA .

llpcarson commented 4 years ago

Jun - I just tried this test (running rt.conf with just that one test in it), and it also fails. The difference, I think, is that the ECFlow (and rocoto) versions are using compile_cmake.sh and rt.sh (command-line) is using compile.sh. So, there must be something broken in the compile.sh script when WW3 is compiled. I haven't had a chance to look further at this, but hopefully that clue will help.

junwang-noaa commented 4 years ago

I see. @Dusan Jovic - NOAA Affiliate dusan.jovic@noaa.gov, is this the issue of gnu compiling we discussed when updating nceplibs? The fix could be one line change in NEMS/src/incmake/component_ww3.mk. @Laurie Carson carson@ucar.edu @Grant Firl grantf@ucar.edu , please let me know if this does not work for you. I am planning to add it in the next commit.

$ git diff . diff --git a/src/incmake/component_WW3.mk b/src/incmake/component_WW3.mk index 0394a5b..6aec064 100644 --- a/src/incmake/component_WW3.mk +++ b/src/incmake/component_WW3.mk @@ -37,7 +37,7 @@ WW3_ALL_OPTS= \ $(ww3_mk): configure +$(MODULE_LOGIC) ; set -x ; cd $(WW3_SRCDIR)/esmf ; \ export $(WW3_ALL_OPTS) ; \

On Fri, Aug 21, 2020 at 2:27 PM Laurie Carson notifications@github.com wrote:

Jun - I just tried this test (running rt.conf with just that one test in it), and it also fails. The difference, I think, is that the ECFlow (and rocoto) versions are using compile_cmake.sh and rt.sh (command-line) is using compile.sh. So, there must be something broken in the compile.sh script when WW3 is compiled. I haven't had a chance to look further at this, but hopefully that clue will help.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/issues/189#issuecomment-678427927, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TNPK22CH5OETVICQXDSB237BANCNFSM4QGTJKGA .

grantfirl commented 4 years ago

OK @junwang-noaa, I'm testing your fix now.

grantfirl commented 4 years ago

@junwang-noaa Your fix seemed to have worked. I was able to compile using WW3=Y without relying on ecflow/compile_cmake.sh. However, in the future, it sounds like running regression tests through ecflow/rocoto using compile_cmake.sh will be the preferred solution? That is, will compile.sh eventually be abandoned in favor of the cmake version?

Thanks for your help!

junwang-noaa commented 4 years ago

Yes, the plan is to use cmake in the future. But I will commit the fix for ufs-weather for now.

On Fri, Aug 21, 2020 at 4:34 PM grantfirl notifications@github.com wrote:

@junwang-noaa https://github.com/junwang-noaa Your fix seemed to have worked. I was able to compile using WW3=Y without relying on ecflow/compile_cmake.sh. However, in the future, it sounds like running regression tests through ecflow/rocoto using compile_cmake.sh will be the preferred solution? That is, will compile.sh eventually be abandoned in favor of the cmake version?

Thanks for your help!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/issues/189#issuecomment-678480957, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TOVMG63AQLQEGEZMD3SB3K3HANCNFSM4QGTJKGA .

climbfuji commented 4 years ago

Yes, the plan is to use cmake in the future. But I will commit the fix for ufs-weather for now.

Yet another powerful demonstration that carrying around legacy systems causes unnecessary extra work ...

junwang-noaa commented 3 years ago

I will close the ticket, please let me know if there is still any issue.