Open uturuncoglu opened 8 months ago
@josephzhang8 @platipodium I track down this issue and compile to code in debug mode. It seems it is crashing with following trace. It is throwing floating point exception from the following line,
I am not sure this is related with the model or coupling interface. I'll check the information that goes to SCHSIM but maybe you might have some idea.
13: [Orion-04-25:44363:0:44363] Caught signal 8 (Floating point exception: floating-point invalid operation)
13: ==== backtrace (tid: 44363) ====
13: 0 0x0000000003b35af1 schism_init_() /work/noaa/nems/tufuk/COASTAL/ufs-coastal_dev/SCHISM-interface/SCHISM/src/Hydro/schism_init.F90:7028
13: 1 0x00000000038d7fd6 schism_nuopc_cap_mp_initializeadvertise_() /work/noaa/nems/tufuk/COASTAL/ufs-coastal_dev/SCHISM-interface/SCHISM-ESMF/src/schism/schism_nuopc_cap.F90:345
13: 2 0x0000000000a77fe4 ESMCI::FTable::callVFuncPtr() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:2167
13: 3 0x0000000000a7c0af ESMCI_FTableCallEntryPointVMHop() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:824
13: 4 0x0000000000bda1e7 ESMCI::VMK::enter() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/Infrastructure/VM/src/ESMCI_VMKernel.C:1125
13: 5 0x00000000008ea0e2 ESMCI::VM::enter() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/Infrastructure/VM/src/ESMCI_VM.C:1216
13: 6 0x0000000000a7942a c_esmc_ftablecallentrypointvm_() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:981
13: 7 0x000000000097b950 esmf_compmod_mp_esmf_compexecute_() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/Superstructure/Component/src/ESMF_Comp.F90:1223
13: 8 0x0000000000d571b1 esmf_gridcompmod_mp_esmf_gridcompinitialize_() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/Superstructure/Component/src/ESMF_GridComp.F90:1412
13: 9 0x000000000091e450 nuopc_driver_mp_loopmodelcompss_() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/addon/NUOPC/src/NUOPC_Driver.F90:2886
13: 10 0x0000000000946734 nuopc_driver_mp_initializeipdv02p1_() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/addon/NUOPC/src/NUOPC_Driver.F90:1313
13: 11 0x000000000095058b nuopc_driver_mp_initializegeneric_() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/addon/NUOPC/src/NUOPC_Driver.F90:481
13: 12 0x0000000000a77fe4 ESMCI::FTable::callVFuncPtr() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:2167
13: 13 0x0000000000a7c0af ESMCI_FTableCallEntryPointVMHop() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:824
13: 14 0x0000000000bd9fda ESMCI::VMK::enter() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/Infrastructure/VM/src/ESMCI_VMKernel.C:2321
13: 15 0x00000000008ea0e2 ESMCI::VM::enter() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/Infrastructure/VM/src/ESMCI_VM.C:1216
13: 16 0x0000000000a7942a c_esmc_ftablecallentrypointvm_() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/Superstructure/Component/src/ESMCI_FTable.C:981
13: 17 0x000000000097b950 esmf_compmod_mp_esmf_compexecute_() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/Superstructure/Component/src/ESMF_Comp.F90:1223
13: 18 0x0000000000d571b1 esmf_gridcompmod_mp_esmf_gridcompinitialize_() /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-7t7fsxpkw36g4ht6c6qbu4bvviztvaim/spack-src/src/Superstructure/Component/src/ESMF_GridComp.F90:1412
13: 19 0x000000000042c5a7 MAIN__() /work/noaa/nems/tufuk/COASTAL/ufs-coastal_dev/driver/UFS.F90:381
13: 20 0x0000000000429392 main() ???:0
13: 21 0x0000000000022495 __libc_start_main() ???:0
13: 22 0x00000000004292a9 _start() ???:0
13: =================================
13: forrtl: error (75): floating point exception
13: Image PC Routine Line Source
13: fv3.exe 00000000042E199B Unknown Unknown Unknown
13: libpthread-2.17.s 00002BA28A9915D0 Unknown Unknown Unknown
13: fv3.exe 0000000003B35AF1 schism_init_ 7028 schism_init.F90
13: fv3.exe 00000000038D7FD6 schism_nuopc_cap_ 345 schism_nuopc_cap.F90
13: fv3.exe 0000000000A77FE4 Unknown Unknown Unknown
13: fv3.exe 0000000000A7C0AF Unknown Unknown Unknown
13: fv3.exe 0000000000BDA1E7 Unknown Unknown Unknown
13: fv3.exe 00000000008EA0E2 Unknown Unknown Unknown
13: fv3.exe 0000000000A7942A Unknown Unknown Unknown
13: fv3.exe 000000000097B950 Unknown Unknown Unknown
13: fv3.exe 0000000000D571B1 Unknown Unknown Unknown
13: fv3.exe 000000000091E450 Unknown Unknown Unknown
13: fv3.exe 0000000000946734 Unknown Unknown Unknown
13: fv3.exe 000000000095058B Unknown Unknown Unknown
13: fv3.exe 0000000000A77FE4 Unknown Unknown Unknown
13: fv3.exe 0000000000A7C0AF Unknown Unknown Unknown
13: fv3.exe 0000000000BD9FDA Unknown Unknown Unknown
13: fv3.exe 00000000008EA0E2 Unknown Unknown Unknown
13: fv3.exe 0000000000A7942A Unknown Unknown Unknown
13: fv3.exe 000000000097B950 Unknown Unknown Unknown
13: fv3.exe 0000000000D571B1 Unknown Unknown Unknown
13: fv3.exe 000000000042C5A7 MAIN__ 381 UFS.F90
13: fv3.exe 0000000000429392 Unknown Unknown Unknown
13: libc-2.17.so 00002BA28ADD8495 __libc_start_main Unknown Unknown
13: fv3.exe 00000000004292A9 Unknown Unknown Unknown
@josephzhang8 @platipodium BTW, this is in the initialization and just before everything and it is called in InitializeAdvertise
.
@josephzhang8 @platipodium Okay. I print out values of diffmin
and dfv
and diffmin
is NaN and dfv
is zero as expected. I think this is a bug in model level since there is no indication to set diffmin
if itur = 0
in the configuration file. In this test case, we have following,
itur = 0
dfv0 = 0 !needed if itur=0
dfh0 = 1.e-4 !needed if itur=0
Anyway, let me know if you are agree about the issue and the bug. Then, please let me know about solution. I'll try to give initial value to diffmin
as 0 and I think same also required for diffmax
.
Th @uturuncoglu . I'll fix this bug. diffm[in,ax] are not used with itur=0, but it's best to init them.
@josephzhang8 Thanks. If you want I could test your fix branch in my end to be sure that it is fixing the issue. After passing this point, there might be another issues that you want to fix. Anyway, it is your call.
@josephzhang8 BTW, I'll also test the code with GNU. Maybe the underlying issue is same with https://github.com/oceanmodeling/schism-esmf/issues/3. BTW, I am not sure why atm2sch test is working without any issue. Maybe that one is using different options for itur
. I'll check it. Anyway, having different tests and running then regularly with DEBUG mode and different compilers will give us a capability to catch these issues in advance.
Yes we test different compilers regularly to catch potential issues. The fixes are needed no matter what, to make SCHISM robust on all platforms. Thx for working with us!
@uturuncoglu: I've fixed the bug in master version. Can u plz pull? Thx
@josephzhang8 Okay. Thanks. I'll try to run the case with your fix and update you.
@josephzhang8 Okay. The code passed that point but now giving error like following,
10: [Orion-06-19:73244:0:73244] Caught signal 8 (Floating point exception: floating-point invalid operation)
10: ==== backtrace (tid: 73244) ====
10: 0 0x0000000004169c3d compute_wave_force_lon_() /work/noaa/nems/tufuk/COASTAL/ufs-coastal_dev/SCHISM-interface/SCHISM/src/Hydro/misc_subs.F90:6194
10: 1 0x000000000393dda3 schism_esmf_util_mp_schism_stateimportwavetensor_() /work/noaa/nems/tufuk/COASTAL/ufs-coastal_dev/SCHISM-interface/SCHISM-ESMF/src/schism/schism_esmf_util.F90:2592
10: 2 0x00000000038e6c0e schism_nuopc_cap_mp_schism_import_() /work/noaa/nems/tufuk/COASTAL/ufs-coastal_dev/SCHISM-interface/SCHISM-ESMF/src/schism/schism_nuopc_cap.F90:1005
10: 3 0x00000000038e1483 schism_nuopc_cap_mp_modeladvance_() /work/noaa/nems/tufuk/COASTAL/ufs-coastal_dev/SCHISM-interface/SCHISM-ESMF/src/schism/schism_nuopc_cap.F90:766
Here is the line, https://github.com/schism-dev/schism/blob/84866bf95d779a43056db4c8885908bd675010b3/src/Hydro/misc_subs.F90#L6194. I did not debug further but it seems that RSXX0
is also NaN but I'll check it. I need to track the source of it.
@josephzhang8 this might be a bug in cap side. I think that the variable assignments that are used in the following call is wrong,
call compute_wave_force_lon(eastward_wave_radiation_stress, &
eastward_northward_wave_radiation_stress,northward_wave_radiation_stress)
I think this needs to use SCHISM_StateUpdate
call rather than direct assignment since we are not using element based fields. Anyway, I'll try to fix it and let you know.
@josephzhang8 @platipodium I was looking for the issue related with the compute_wave_force_lon
. It seems that is is an issue with the array sizes. So, Why stress related fields (like northward_wave_radiation_stress
) are defined in the size of isPtr%numOwnedNodes
(or np
) but other forcing variables like pr2
is defined in npa
size. Is there any underlying reason for it? It is hard to fallow that piece of code. At the end of the day, this information is used to set wwave_force
variable and it is in a size of (2,nvrt,nsa)
. What is the relationship between np
, nsa
and npa
? It seems that hgrad_nodes
call basically handles the moving information from RSXX
to DSXX3D
but that also uses size of npa
. Anyway, maybe I am confused but it seems the size of the arrays are not consistent. Let me know what you think?
@uturuncoglu: npa=np+npg (resident + ghost). I thought EMSF does not handle ghost zone so the input arrays RSXX0 have a dim of np, and then RSXX etc are used in exchange to get ghost.
ns, nsa are # of edges of elements. Wave forces are defined at edges (side centers) due to gradient operator. The arrays in hgrad_nodes() have to be dim of npa (augmented).
@josephzhang8 Yes, we don't have ghost elements anymore. So, we need to adjust the call based on this reality (we might have two version of it one with node based mesh and another for element until we switch element based completely. Anyway, it would be hard for me to understand the logic over there. So, is there any both in your side to look at closely that part of code? If not, I could try to look at but I am not sure who I have the fix since I also need to spend time in other projects.
@uturuncoglu : that routine is basically same as the (well tested) routine used by the internal wave module (WWM), so I think the only potentially errors may be from interface.
@josephzhang8 Okay. So, you think that we don't need to change anything in the routine but maybe add some logic to fill the arrays in a correct way before passing to it. Right?
Hold on... I think I may have found the bug....
@uturuncoglu I just pushed a new master; can u plz check? Thx! The bug: hgrad_nodes() expects 3D variables for radiation stress components.
@josephzhang8 Okay. Let me test. This might still need some fix in upper level to get the required data from import state.
Yes that's the part I'm not sure about. I noticed last time u added some allocatable, target arrays in schism_glbl.F90
@josephzhang8 I am getting FPE from sum1=sum(RSXX0)
. This is probably, RSXX0
field is not filled correctly and includes some NaN
value (it is not initialized after allocation). I'll try to fix it in the cap side.
@josephzhang8 I think I fixed the issue. The wave related fields looks fine now. I'll do more test and then crate PR in SCHSIM-ESMF repository. Just for your information, here are the changes https://github.com/oceanmodeling/schism-esmf/pull/new/hotfix/wave_stress
@josephzhang8 I run the case and plot rsxx
, rsyy
and rsxy
with my simple NCL script for last time step. It looks like rsxx
, rsyy
is consistent with the currents but rsxy
looks little bit off to me. Since you are more experienced than me, I wonder if you have any idea. BTW, this is idealized case that @pvelissariou1 created before. So, the results could be weird.
rsxx
rsyy
rsxy
@josephzhang8 So, we might have issue in outputting rsxy
or calculating it. I'll check import state for the wave field if i see same structure also in there or not.
I don't see scale bar; maybe SXY is very small?
@josephzhang8 JFYI, i checked the import state for the rsxy
and it looks fine over there. So, I am not sure but there could be an issue in the model side when it is trying to write it.
@josephzhang8 let me use same scale with the preview and NCL to double check.
@josephzhang8 Okay. I think it is hard to make scale same since SCHSIM is applying some unit conversion I think. The data coming from wave has range of -9.07627
to 18.6689
but in output the ranges are -8.24308e-07
to 7.92218e-07
. Anyway it seems that I have also plotting different time step but after fixing the plot range the NCL plot looks reasonable.
Anyway, I think this is fine and fix working as expected. I'll also check the GNU issue to see if this fix also handles that case too.
Great to know; thx @uturuncoglu! I'm almost ready to work on the 3D (vortex) coupling (just found out the array names we need from WW3 this afternoon).
@josephzhang8 Joseph, could you please update the document (if needed)? ww3-exports-ocn-3Dwave-terms
@josephzhang8 It is also working with GNU. I will do one last test with GNU and DEBUG mode. It that also pass I'll create PR and maybe we could close two issues in the same time.
gr8!
@pvelissariou1 : I just did that
@uturuncoglu : do u want me to review/merge the PR now? Thx
scratch that... I see the request from u now
@josephzhang8 JFYI, I replied in the PR side. Once this in I am plaining to define two more test in UFS Coastal side one for GNU and one for DEBUG mode to cover different cases. So, we would be sure where are fine with those option in the future.
@josephzhang8 Thank you Joseph. @uturuncoglu After the PR is merged, I guess you will update ufs-coastal as well. I am planning to check all SCHISM related tests.
@pvelissariou1 You could test by checking out master in SCHSIM and https://github.com/oceanmodeling/schism-esmf/tree/hotfix/wave_stress branch in SCHSIM-ESMF side. If we have issue, it would be better to know it before merge. @josephzhang8 maybe we could wait for Takis to perform initial test.
@uturuncoglu , @josephzhang8 Thank you very much both. Ufuk I will check SCHISM as you suggested and we will talk about this on Monday. Hopefully all will be fine.
@mansurjisan to check
coastal_ike_shinnecock_atm2sch2ww3
test case is hanging on Orion with Intel compiler. I also tried to run it on Hercules with Intel and it is passing. So, this could be a system issue but maybe it is linked with https://github.com/oceanmodeling/schism-esmf/issues/3 and needs to be investigated.