Closed uturuncoglu closed 8 months ago
I checked with NCAR's cprnc
tool and it seems that only one field (Med_frac_ocn_ofrac
) is not matching,
Med_frac_ocn_ofrac (Med_frac_ocn_nx,Med_frac_ocn_ny,time) t_index = 1 1
16 5780 ( 53, 1, 1) ( 2698, 1, 1) ( 29, 1, 1) ( 29, 1, 1)
5780 2.141958208000000E+09 -1.874137024000000E+09 2.6E+09 -1.234390976000000E+09 1.4E-03 -1.234390976000000E+09
5780 1.627074624000000E+09 -1.784926144000000E+09 1.332195392000000E+09 1.332195392000000E+09
5780 ( 1, 1, 1) ( 2367, 1, 1)
avg abs field values: 1.485276628719723E+06 rms diff: 6.1E+07 avg rel diff(npos): 1.4E-03
1.730461296712803E+06 avg decimal digits(ndif): 1.0 worst: -0.3
RMS Med_frac_ocn_ofrac 6.0673E+07 NORMALIZED 3.7735E+01
This could be issue in the SCHSIM cap but it is also weird that coastal_ike_shinnecock_atm2sch
is passing without any issue. Maybe that does not pass ocean fraction?
I checked the coastal_ike_shinnecock_atm2sch
case. That has also Med_frac_ocn_ofrac
field and if I check it with ncdump -v
it seems fine.
If I check the same field in coastal_ike_shinnecock_atm2sch2ww3_intel
test, I am seeing very large and strange values (probably not initialized). I need to check the mediator to understand how this field is calculated (or maybe provided by the SCHSIM as mask - need to double check). There might be an issue of data assignment or mesh itself.
Med_frac_ocn_ofrac =
-369381312, 5383, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, -1234390976, 5324, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2141958208, 5446, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1220292672, 5393, 3, 3, 3, 3, 3,
@pvelissariou1 Are coastal_ike_shinnecock_atm2sch2ww3
and coastal_ike_shinnecock_atm2sch
using same mesh? It is strange that both case has different value in the Med_frac_ocn_ofrac
field. The coastal_ike_shinnecock_atm2sch
case has something like following which seems correct to me.
Med_frac_ocn_ofrac =
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
We are setting ocean_mask
using idry_e
. So, maybe there is an issue in the configuration.
@josephzhang8 How idry_e
is initialized? Does the model reads it from file? Maybe mesh and the idry_e
is not consistent for coastal_ike_shinnecock_atm2sch2ww3
configuration. Not sure.
@uturuncoglu idry_e is init'ed before time stepping to reflect the wet/dry condition initially if ihot=0. If ihot/=0, the flags are read in from hotstart.nc.
Maybe some UFS calls precede the SCHISM init?
@josephzhang8 Let me check. I have no issue with coastal_ike_shinnecock_atm2sch
case. It could be related with this specific configuration.
This is also failing on Orion agains the baseline.
==> logs/log_orion/rt_004_coastal_ike_shinnecock_atm2sch2ww3_intel.log <==
baseline dir = /work/noaa/nems/tufuk/RT/NEMSfv3gfs/develop-20240126/coastal_ike_shinnecock_atm2sch2ww3_intel
working dir = /work/noaa/stmp/tufuk/stmp/tufuk/FV3_RT/rt_100793/coastal_ike_shinnecock_atm2sch2ww3_intel
Checking test 004 coastal_ike_shinnecock_atm2sch2ww3_intel results ....
Comparing outputs/schout_000000_1.nc .........OK
Comparing outputs/schout_000000_2.nc .........OK
Comparing outputs/schout_000001_1.nc .........OK
Comparing outputs/schout_000001_2.nc .........OK
Comparing outputs/schout_000002_1.nc .........OK
Comparing outputs/schout_000002_2.nc .........OK
Comparing 20080824.000000.out_grd.ww3.nc .........OK
Comparing 20080825.000000.out_grd.ww3.nc .........OK
Comparing 20080826.000000.out_grd.ww3.nc .........OK
Comparing 20080827.000000.out_grd.ww3.nc .........OK
Comparing 20080828.000000.out_grd.ww3.nc .........OK
Comparing ufs.cpld.cpl.hi.2008-08-24-00000.nc ............ALT CHECK......NOT OK
Comparing ufs.cpld.cpl.hi.2008-08-25-00000.nc ............ALT CHECK......NOT OK
Comparing ufs.cpld.cpl.hi.2008-08-26-00000.nc ............ALT CHECK......NOT OK
Comparing ufs.cpld.cpl.hi.2008-08-27-00000.nc ............ALT CHECK......NOT OK
Comparing ufs.cpld.cpl.hi.2008-08-28-00000.nc ............ALT CHECK......NOT OK
0: The total amount of wall time = 655.100498
0: The maximum resident set size (KB) = 239148
Test 004 coastal_ike_shinnecock_atm2sch2ww3_intel FAIL
@josephzhang8 @platipodium I check this little bit more. It seems that ocean mask in the SCHISM export state is fine. I still need to check it in the mediator side to be sure that it is received correctly. I'll look at it later and update you if I find something new.
@josephzhang8 @platipodium @pvelissariou1 @janahaddad Okay. I think I found the source of the issue. In SCHSIM side, we are defining ocean_mask
export field as integer but mediator expects it in double. So, this was creating problem when mediator defining the ocean fraction. I quickly hacked the SCHSIM to provide double rather than integer and that fixed the issue in the mediator side and I could able to see ocean mask and fraction correctly. I still did not implement the fix but maybe I could create temporary variable to create double version of idry mask variable and pass that one to create mask. I am not sure how this is working with atm2sch case. Maybe we are just lucky (or not). This also shows the importance of the robust and multi level testing. It is a small issue but I could not catch it until creating baseline and running configuration against it. Anyway, I will fix it soon and create a PR. Thanks all for your help.
In SCHSIM side, we are defining
ocean_mask
export field as integer but mediator expects it in double.
Thanks for pointing this out. We have other ocean models that use mostly integer
, but also logical
masks. And then there's the issue of convention. Is a non-zero value unmasked (bash-style, 1=FALSE) or is it masked (python style, 1 = TRUE)? For generic coupling
The different mask types and semantics have been taken care of in the FABM project with preprocessor definitions, but I'd prefer proper metadata and conversion routines
Thx @uturuncoglu for finding out the source. I agree with @platipodium that this needs to be ironed out carefully. @danishyo had similar issue with FV3<->SCHISM - in that case, he had to convert idry to iwet=1-idry b/c FV3 expects that.
Short of any change in mediator, we can easily create a temp var as Ufuk did to convert idry to double, if I understand correctly.
@josephzhang8 @platipodium @pvelissariou1 @janahaddad Okay. I implemented a fix and it is in https://github.com/oceanmodeling/schism-esmf/tree/hotfix/ocean_mask branch. You could also see the difference from the following link - https://github.com/oceanmodeling/schism-esmf/pull/new/hotfix/ocean_mask. I created the baseline and run coastal_ike_shinnecock_atm2sch2ww3
against it. Now all files are passing. I think fix is working as expected. Next, I'll look at issue https://github.com/oceanmodeling/schism/issues/4 in the component side that creates issue. Once we solve it too, I and plaining to create two PR one in model and other one is in nuopc interface side. In the mean time if you could test/review hotfix/ocean_mask
branch in your side that would be great.
Thanks for putting together this hot fix @uturuncoglu. I noticed that we have now a (likely good, but not yet tested) solution for source node meshes line 575: if (isPresent .and. meshloc == ESMF_MESHLOC_NODE)
. We would not be able to deal with source element meshes, but I guess that is not a realistic case? If possible, a simple error for this case "not implemented" would be useful.
@platipodium It seems that if it is element then we are removing ghosts and then there is no need to call halo update since there is nothing to update. I think this is only valid if we have node based mesh that has ghost points. Anyway, maybe this is not the answer but if you could clarify little bit more that would be helpful.
@platipodium It seems that if it is element then we are removing ghosts and then there is no need to call halo update
That seems a reasonable analysis. I agree we don't need to deal with the ESMF_MESHLOC_ELEMENT case.
@platipodium @josephzhang8 JFYI, we might still issue. I am investigating now. Please hold on.
@platipodium @josephzhang8 Okay. I think I found it. We need to specialize model_label_DataInitialize
in SCHSIM cap and set So_mask in there since the mask is used in mediator DataInitialize step. At this point, we are providing it in the Advance step and mediator could not reach that information since requires mask before advance step. I also checked the MOM6 cap and the mask is set in its DataInitialize phase. Anyway, we could discuss it more when we have a call but utimate fix is on its way. I'll update you when it is ready.
The RT baseline check issue is fixed by the restructuring the SCHSIM NUOPC cap. The PR (https://github.com/schism-dev/schism-esmf/pull/30) is merged. I am closing this issue.
thanks @uturuncoglu !
@janahaddad @pvelissariou1 @saeed-moghimi-noaa I created baselines for supported test on Hercules and run the tests against baseline. The
coastal_ike_shinnecock_atm2sch2ww3
test is failing as following,It seems that we have reproducibility or configuration issue in here that needs to be solved. Maybe it is just metadata issue in the mediator history since model output reproduces. Anyway, this needs to be investigated. I don't think this configuration is ever tested agains the baseline before. I am also creating baseline on Orion and test over there too.