mom-ocean / MOM6

Modular Ocean Model
Other
181 stars 212 forks source link

MOM6/SIS2 restarts do not match with and without a mask_table #252

Open nikizadehgfdl opened 8 years ago

nikizadehgfdl commented 8 years ago

In a 1/2 degree CM4 test, there are a few restart variables that differ when using a mask_table (compared to the same layout but without a mask_table). This causes problem for users who want to change layout in the middle of a run or users who judge answer preserving by comparing restarts.

The variables differ ONLY on the masked pes and the reason for the mismatch is clear; the missing value for these particular variables on land is non-zero, whereas their value is set to zero on those cells when a mask_table is used.

Restart file MOM6.res.nc non-match variables: age: the mismatch value is 1 on the masked pes because the missing value for these variable is 1. sfc, ave_ssh : the mismatch value is 0.075 on the masked pes because the missing value for these variables is 0.075. h, h2 : the mismatch value is 0.001 on the masked pes because the missing value for these variables is 0.001.

Restart file ice_model.res.nc non-match variable: t_surf: the mismatch value is 0.15 on the masked pes because the missing value for these variables is 0.15.

Is it possible to change the above non-ocean cell values to zero? Or do we need to find a way to set values on masked pes according to these values?

Zhi-Liang commented 8 years ago

Hi Niki,

mppnccombine option -m is to set the value to 0 or missing value. When -m is present, the masked pe will be set to missing_value. When -m in not present, the masked pe will be set to 0. For the history file, option "-m" is used in fre. It looks like restart file combine does not have the -m option. We may ask Amy to add -m option for restart file combine. But we might have some side affect.

Zhi

On Thu, Jan 21, 2016 at 5:40 PM, Niki Zadeh notifications@github.com wrote:

In a 1/2 degree CM4 test, there are a few restart variables that differ when using a mask_table (compared to the same layout but without a mask_table). This causes problem for users who want to change layout in the middle of a run or users who judge answer preserving by comparing restarts.

The variables differ ONLY on the masked pes and the reason for the mismatch is clear; the missing value for these particular variables on land is non-zero, whereas their value is set to zero on those cells when a mask_table is used.

Restart file MOM6.res.nc non-match variables: age: the mismatch value is 1 on the masked pes because the missing value for these variable is 1. sfc, ave_ssh : the mismatch value is 0.075 on the masked pes because the missing value for these variables is 0.075. h, h2 : the mismatch value is 0.001 on the masked pes because the missing value for these variables is 0.001.

Restart file ice_model.res.nc non-match variable: t_surf: the mismatch value is 0.15 on the masked pes because the missing value for these variables is 0.15.

Is it possible to change the above non-ocean cell values to zero? Or do we need to find a way to set values on masked pes according to these values?

— Reply to this email directly or view it on GitHub https://github.com/NOAA-GFDL/MOM6/issues/252.

nikizadehgfdl commented 8 years ago

@Zhi-Liang the -m option is present for the restarts! I retried the restart combine manually and got the same result, the missing value on masked pes are always 0. Indeed it looks like -m has no effect at all.

gaea1: cd /lustre/f1/Niki.Zadeh/work/ulm_201505_awg_v20151106_mom6sis2_2016.01.16/CM4_c96L32_am4g7_2000_sis2_lowmix.2016.01.16_1x0m2d_216x2a_1756x1o.o51360/RESTART 
gaea1: /ncrc/home2/fms/local/opt/fre-nctools/bronx-10/ncrc2/bin/mppnccombine -64 -h 16384 -m -k 21 MOM.res.nc MOM.res.nc.0000 MOM.res.nc.0001 MOM.res.nc.0002 MOM.res.nc.0003
blocking factor (k) > total records (1). Setting blocking factor to 1.
Zhi-Liang commented 8 years ago

Hi Niki,

I think the reason is that the restart file field does not have attribute _FillValue or missing_value. mppnccombine will read the missing value from the restart file. If the restart file do not have the attribute, 0 will be used for masked region. The solution requires change to shared code and MOM6. We will discuss how to proceed with this.

Zhi

On Mon, Jan 25, 2016 at 2:33 PM, Niki Zadeh notifications@github.com wrote:

@Zhi-Liang https://github.com/Zhi-Liang the -m option is present for the restarts! I retried the restart combine manually and got the same result, the missing value on masked pes are always 0. Indeed it looks like -m has no effect at all.

gaea1: cd /lustre/f1/Niki.Zadeh/work/ulm_201505_awg_v20151106_mom6sis2_2016.01.16/CM4_c96L32_am4g7_2000_sis2_lowmix.2016.01.16_1x0m2d_216x2a_1756x1o.o51360/RESTART gaea1: /ncrc/home2/fms/local/opt/fre-nctools/bronx-10/ncrc2/bin/mppnccombine -64 -h 16384 -m -k 21 MOM.res.nc MOM.res.nc.0000 MOM.res.nc.0001 MOM.res.nc.0002 MOM.res.nc.0003 blocking factor (k) > total records (1). Setting blocking factor to 1.

— Reply to this email directly or view it on GitHub https://github.com/NOAA-GFDL/MOM6/issues/252#issuecomment-174631041.

nikizadehgfdl commented 8 years ago

@Zhi-Liang could you point me to an existing code sample that sets this attribute in the restart file? Is it being used in any other components?

Zhi-Liang commented 8 years ago

Hi Niki,

The current fms_io does not have such capability. Also MOM6 does not use fms_io. Do all the variables in MOM6 have the same missing value? If it does, I think the change is not much. You may just change MOM_io.F90 to add missing=missing_value when calling mpp_write_meta. Do you want to try to change by yourself? I will help you if you have questions.

Greetings

Zhi

On Fri, Feb 19, 2016 at 12:01 PM, Niki Zadeh notifications@github.com wrote:

@Zhi-Liang https://github.com/Zhi-Liang could you point me to an existing code sample that sets this attribute in the restart file? Is it being used in any other components?

— Reply to this email directly or view it on GitHub https://github.com/NOAA-GFDL/MOM6/issues/252#issuecomment-186304127.

nikizadehgfdl commented 8 years ago

No, the variables that show differences have different and non-zero land values, so one size will not fit all:

age missing value is -1 which comes from lan_val = -1, the value inside the mask_table regions is 0 h,h2 missing values are 0.001 which I think come from variable min_thickness, the value inside the mask_table regions is 0 ave_ssh, sfc missing values are 0.075 , the value inside the mask_table regions is 0 ice_model t_surf missing value is 273.15 , the value inside the mask_table regions is 273 (where does this come from?)

Zhi-Liang commented 8 years ago

Hi Niki,

Then the change will be more complicated. All these changes will be done inside MOM6. 1) add optional argument missing_value to the routine register_restart_field, To minimize the change, we may set the default value to be used most. 2) Modify MOM_IO to pass missing_value into mpp_write_meta. 3) modify MOM code to pass in correct missing_value for each variable when registering restart field. For those variable with same missing_value as the default value, it is not necessary to pass in the missing value. I hope only few have different missing value from others, then the change will be not much.

Zhi

On Fri, Feb 19, 2016 at 1:38 PM, Niki Zadeh notifications@github.com wrote:

No, the variables that show differences have different and non-zero land values, so one size will not fit all:

age missing value is -1 which comes from lan_val = -1, the value inside the mask_table regions is 0 h,h2 missing values are 0.001 which I think come from variable min_thickness, the value inside the mask_table regions is 0 ave_ssh, sfc missing values are 0.075 , the value inside the mask_table regions is 0 ice_model t_surf missing value is 273.15 , the value inside the mask_table regions is 273 (where does this come from?)

— Reply to this email directly or view it on GitHub https://github.com/NOAA-GFDL/MOM6/issues/252#issuecomment-186354887.

nikizadehgfdl commented 8 years ago

I added a missing_value to each restart variable but that did not solve this issue.

The restart value of a variable on "land" is the initial value of that variable in the code after it is allocated. It could change if the variable is changed without using a land mask in the calculation. The final value is 0 for most of the restart vars but is some other peculiar value for age,h,h2,sfc,ave_ssh and these values have nothing to do with the "missing_value" attribute.

I can use the optional argument "default_data" of write_field in MOM_restart.F90 to set a default value for the variables over the mask_table on land, but that means I have to define a separate missing_value for each restart variable that matches the value on land outside mask_table. I have no way to know what this value is (as it is not a parameter but a runtime variable).

I think we need the subroutine save_restart() to do some magic and set the value on land for all variables in the restart file to a missing_value constant. I think diag_manager is doing that for history files, right?

Zhi-Liang commented 8 years ago

Hi Niki

I think set default_data to be the missing value will solve the issue.

Zhi

On Mar 18, 2016, at 11:53 AM, Niki Zadeh notifications@github.com wrote:

I added a missing_value to each restart variable but that did not solve this issue.

The restart value of a variable on "land" is the initial value of that variable in the code after it is allocated. It could change if the variable is changed without using a land mask in the calculation. The final value is 0 for most of the restart vars but is some other peculiar value for age,h,h2,sfc,ave_ssh and these values have nothing to do with the "missing_value" attribute.

I can use the optional argument "default_data" of write_field in MOM_restart.F90 to set a default value for the variables over the mask_table on land, but that means I have to define a separate missing_value for each restart variable that matches the value on land outside mask_table. I have no way to know what this value is (as it is not a parameter but a runtime variable).

I think we need the subroutine save_restart() to do some magic and set the value on land for all variables to a missing_value constant in the restart file. I think diag_manager is doing that for history files, right?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

nikizadehgfdl commented 8 years ago

@Zhi-Liang these are variables and their land value changes as well as their ocean value. I don't know their final value on land before the model run finishes. The only one I know its value beforehand is "age" and for that your default_value method works.

Zhi-Liang commented 8 years ago

Hi Niki,

Like Jeff suggested, we may use a default missing_value(fill value): MPP_FILL_INT, MPP_FILL_DOUBLE, MPP_FILL_FLOAT. To use this option, a mask data is needed for each field. This will work for writing out the data. There might be a potential issue for reading. After reading, the data value over land points will be the default fill value. Then we might get floating point exception if some computation is done on those land points data.

So there is no simple solution for this problem if the value over the land points are changing during time step.

Zhi

On Tue, Mar 22, 2016 at 2:59 PM, Niki Zadeh notifications@github.com wrote:

@Zhi-Liang https://github.com/Zhi-Liang these are variables and their land value changes as well as their ocean value. I don't know their final value on land before the model run finishes. The only one I know its value beforehand is "age" and for that your default_value method works.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/NOAA-GFDL/MOM6/issues/252#issuecomment-199963687

nikizadehgfdl commented 8 years ago

The root of the issue is that unlike the diagnostics data there is not land mask associated with restart data, neither in MOM6 nor in fms_io save_restarts(). So, setting a missing_value does not matter because the data is not masked on the land.

We might be able to find a trick to query the land value of each restart variable just before it is being written and set the missing_value to that to fill the hole over a mask_table. This won't be robust.

Zhi-Liang commented 8 years ago

Hi Niki,

I think this will work. I think the performance will be not an issue since it is only done at the end of the run.

Zhi

On Fri, Apr 8, 2016 at 10:44 AM, Niki Zadeh notifications@github.com wrote:

The root of the issue is that unlike the diagnostics data there is not land mask associated with restart data, neither in MOM6 nor in fms_io save_restarts(). So, setting a missing_value does not matter because the data is not masked on the land.

We might be able to find a trick to query the land value of each restart variable just before it is being written and set the missing_value to that to fill the hole over a mask_table. This won't be robust.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/NOAA-GFDL/MOM6/issues/252#issuecomment-207459174

awallcraft commented 5 years ago

This has not been fixed. There are no data_void attributes in the restart files.

In my case, 3 of 60 partial nc files don't get written at all (they are entirely covered by missing pes). For (say) part_size in ice_model.res, masked pes have a value of 9.96921e+36 (I'm not sure how ncview knows this is a data_void to show it as white), but mppcombine -m puts 0.0 in areas with no partial file. These are the 3 large blue squares.

ncview part_size

I thought it was just the missing partial files that caused problems, but perhaps even without these having a different set of masked pes would make the restart fail.

For robustness a data_void has to be outside the valid range, so 0.0 almost never works. I'm not sure if NaN is "legal" as a data_void, or if can be an automatic missing value. How about implementing a default data_void that is not the land value. Then the model can replace data_void with each variables default land value on input and can provide data_void on output for masked pes and all fields would have a data_void attribute. If there is an actual data_void over land, this could replace the default value for this field and used on masked pes.

nikizadehgfdl commented 5 years ago

@awallcraft What model is this and how does it crash when you use these combined restarts?

awallcraft commented 5 years ago

I am running ice_ocean_SIS2/repro/MOM6 on a 1/12 degree tripole grid. This was built last March:

conrad01 158> git describe dev/gfdl/2018.03.06

715 warnings like:

WARNING from PE 1127: Bad ice state sum_part_sz Start of set_ice_surface_state ; at -136.9 58.7 or i,j,k = 5 31 0; nbad = 298350 on pe 1127 ; sum_part_sz = 5.9815E+37

143 fatals like:

FATAL from PE 28: Input to adjust_ice_categories, non-zero pond mass rests atop no ice.

81 tracebacks:

MOM6 00000000014FFE81 mpp_mod_mp_mpp_er 50 mpp_util_mpi.inc MOM6 0000000000FB086F sis_transportmp 512 SIS_transport.F90 MOM6 00000000005AB748 sis_slow_thermo_m 476 SIS_slow_thermo.F90 MOM6 0000000000C3ADFF ice_model_modmp 288 ice_model.F90 MOM6 0000000000C3D57F ice_model_modmp 198 ice_model.F90 MOM6 00000000005C5E4F MAIN__ 978 coupler_main.F90

Alan.

On Wed, Feb 13, 2019 at 12:13 PM Niki Zadeh notifications@github.com wrote:

@awallcraft https://github.com/awallcraft What model is this and how does it crash when you use these combined restarts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-GFDL/MOM6/issues/252#issuecomment-463284901, or mute the thread https://github.com/notifications/unsubscribe-auth/AcL50W60dHTN-zWwqwhC_dBDmWmMiJemks5vNEe0gaJpZM4HJ73T .