Open DeniseWorthen opened 1 year ago
i had 100% of tests pass about a week ago or so on c5. so the debug test runs to completion here but you are seeing these errors anyway?
Yes, the test runs, so it is not fatal. I don't know what it means actually. It must be in MAPL?
@DeniseWorthen @ulmononian - does spack-stack need to have esmf-debug module for running the cpld_control_nowave_noaero_p8 and cpld_bmark_p8 in debug mode ?
No, we no longer use esmf built w/ debug.
Hi, @DeniseWorthen . Below are my experiment directories for cpld_control_nowave_noaero_p8_intel & cpld_bmark_p8_intel on Gaea C5, if you want to take a look at the PET logs. If you dont have access to view let me know.
/lustre/f2/scratch/Zachary.Shrader/FV3_RT/rt_76751/cpld_control_nowave_noaero_p8_intel /lustre/f2/scratch/Zachary.Shrader/FV3_RT/rt_76751/cpld_bmark_p8_intel
@zach1221 The same behaviour appears to be present. No PET error messages in the cpld_control_nowave_noaero_p8_intel
test, which does not include aerosols. In the cpld_bmark_p8_intel
test, the only PET log which does not contain the error message is PET0767 (the last atm PET).
@DeniseWorthen @zach1221
@zach1221 @jkbk2004 The behavior on hercules w/rt these two cases is the same on Hercules. That is, there is no ERROR on the test w/o aerosols, ie, cpld_control_noaero_p8_intel. There is an PET log error on all but the last ATM PE for the cpld_bmark_p8 test. I also checked the low-resolution case, cpld_control_p8_intel and it shows the same thing---ERROR on all but the last ATM PE. Note, Aerosols run on the same PEs as the ATM. I believe this most likely a MAPL issue and not ESMF.
@DeniseWorthen - A working directory on Gaea-c5 with the cpld_bmark_p8 test is /lustre/f2/scratch/ncep/Natalie.Perlin/FV3_RT/rt_221558/cpld_bmark_p8_intel In case you may take a look to see whether this is the same behavior.
@natalie-perlin I had checked earlier w/ @zach1221 run directories and confirmed that the same error message is present as in the original issue. I also later confirmed that the same message is present on a hercules run so it isn't specific to C5. I believe it to be a MAPL issue most likely.
I think GSFC people might be able to access to hercules. It will be a great starting point if we can set up a MAPL debugging installation and experiment. @mathomp4 Is Jiang still available?
I think GSFC people might be able to access to hercules. It will be a great starting point if we can set up a MAPL debugging installation and experiment. @mathomp4 Is Jiang still available?
@jkbk2004 Yes. @weiyuan-jiang should be able to help. That said, I'm not sure he (or any of us) have access to Hercules. We do have access to Orion.
I think I have access to Hercules. Judging from the logging error, it is more like a problem from ESMF ( due to the building ? ). But I will take a look at it.
@weiyuan-jiang Thanks for checking. The reason I suspect MAPL is that we do not see the error in cases w/o the aerosol component.
What version of gocart and MAPL are you using? @DeniseWorthen
I'm not sure how to tell which version of gocart we are using, but for MAPL we are using 2.35.2-esmf-8.4.2
Looks like 2.1.1 for gocart
I have talked to Ben. He thought there might be a small chance that an attribute is not there in the MAPL_GetHorzIJIndex call. ( It is confusing though, because it should not produce error) . Anyway, I have replaced that call in the branch . Would you please try this new MAPL?
Since this error is not specific to C5 (it also shows up on Hercules), I've edited the issue title.
@DeniseWorthen I see this error in GEFS RT intel debug mode
Description
I compiled and ran both the
cpld_control_nowave_noaero_p8
andcpld_bmark_p8
in debug mode using @ulmononian's feature/add_c5 branch. Both tests run, but the bmark test (which has aerosols) produces an PET log error message on PEs 000 through 0766. There is no error message in the 0767 log. The aerosol component runs on PEs 0:767. I don't know why the last Aerosol PE does not produce the error.The error message is:
To Reproduce:
compile and run the feature/add_c5 branch using DEBUG for the bmark P8 test.
Additional context
Output