oceanmodeling / schism-esmf

Earth System Modeling Framework cap for SCHISM
0 stars 0 forks source link

coastal_ike_shinnecock_atm2sch2ww3 hangs with GNU compiler #3

Open uturuncoglu opened 5 months ago

uturuncoglu commented 5 months ago

The coastal_ike_shinnecock_atm2sch2ww3 configuration hangs with GNU compiler on Hercules. The coastal_ike_shinnecock_atm2sch test case is running without any issue. Example run directory is in /work2/noaa/stmp/tufuk/stmp/tufuk/FV3_RT/rt_2495275/coastal_ike_shinnecock_atm2sch2ww3_intel. I also tried to attach gdb to the hanged processes but that also hangs. It could be a system issue but needs to be investigated further.

uturuncoglu commented 4 months ago

@josephzhang8 I also debug this one. It is hanging without any issue in error log and also I tried to attach the gdb to the processes but not any clue. I am just seeing following message in the standard out,

67:  B_JGS_BLOCK_GAUSS_SEIDEL is used but the Jacobi solver is not choosen
67:  Please set JGS_USE_JACOBI .eqv. .true.
68:  B_JGS_BLOCK_GAUSS_SEIDEL is used but the Jacobi solver is not choosen
68:  Please set JGS_USE_JACOBI .eqv. .true.

Do you have any idea? Thanks.

josephzhang8 commented 4 months ago

This is message from wave module (WWM). Can u follow the instruction there in wwminput.nml?

josephzhang8 commented 4 months ago

@uturuncoglu: for WWM, it's also best to enable init as zero flag: -finit-local-zero (for gnu)

uturuncoglu commented 4 months ago

@josephzhang8 Okay. Let me look at more carefully. We are using WW3 from UFS Weather Model and its build. So, I don't think there is an issue with the build and flags since it is already used by various applications without any issue. I'll update you if I find something new. Thanks for your help.

uturuncoglu commented 4 months ago

@josephzhang8 Okay. GNU is working fine but I have still issue with GNU+DEBUG combination. It seems that it is stacking outside of the SCHSIM. So, maybe Hercules is not the right platform to test this combination. Or maybe it is just too slow to see the progress. I also attached the gdb to the processes but there is no too much information. Some processes are stuck in the broadcast from ESMF. Anyway, I'll keep this issue open at this point and maybe I could try on another platform that supports GNU like NCAR's Derecho.

#9  0x0000000000abea8f in ESMCI::VMK::broadcast(void*, int, int) ()
#10 0x0000000000a09dde in ESMCI::broadcastInfo(ESMCI::Info*, int, ESMCI::VM const&) ()
#11 0x0000000000af48da in ESMC_InfoBaseSyncDo ()
#12 0x0000000000af60d4 in ESMC_InfoBaseSync ()
#13 0x0000000000821584 in __esmf_infosyncmod_MOD_esmf_infosyncgridcomp ()
#14 0x0000000000423296 in __esmf_attributemod_MOD_esmf_attributeupdategridcomp ()
#15 0x0000000000973748 in __nuopc_driver_MOD_consistentcomponentattributes ()
#16 0x0000000000973c29 in __nuopc_driver_MOD_loopmodelcompsattributeupdate ()
#17 0x0000000000975cd2 in __nuopc_driver_MOD_initializeipdv02p3 ()
#18 0x00000000009aa154 in __nuopc_driver_MOD_initializegeneric ()
uturuncoglu commented 4 months ago

Just current status of the issue: The configuration runs with GNU compiler on Hercules but GNU+DEBUG combinations seems hanging. So, this might be a platform issue. Will try another platform to see if I could reproduce over there or not.

janahaddad commented 3 months ago

@pvelissariou1 any update on the hotfix testing you did here: https://github.com/schism-dev/schism-esmf/pull/31

uturuncoglu commented 3 weeks ago

@pvelissariou1 @yunfangsun If you don't mind, could you test this configuration (coastal_ike_shinnecock_atm2sch2ww3) on Hera (I have no access to tat machine). It would be nice to run with both Intel and Gnu to see what happens. I tried on Derecho but I think GNU installation has some issue in there and UFS Weather Model just using Hera and Hercules for GNU testing. I'll also try to run on Orion in my side. BTW, you might want to sync input directory from Hercules since it has some changes.

uturuncoglu commented 3 weeks ago

@janahaddad @pvelissariou1 @yunfangsun rt.sh is not workin on Orion. This is probably introduced due to the OS/system update on Orion. Maybe it is not supported anymore not sure. I opened a ticket in UFS Weather Model side - https://github.com/ufs-community/ufs-weather-model/issues/2365. If we could reproduce these errors, we might close this issue and open again if we have similar issue. My tests on Hercules just runs fine. Maybe Hera testing could give more insight.