payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
18 stars 26 forks source link

Cannot change timestep to fraction of a second in MITgcm #228

Closed aidanheerdegen closed 4 years ago

aidanheerdegen commented 4 years ago

Report from user:

Another MITgcm issue with changing timestep. It seems that you can't change a timestep to a fraction of a second e.g. here I tried to change the timestep from 0.1 to 0.2, and I get the error

(base) [cjs157@raijin4 SOMs_hr1_200m_nc2]$ payu setup Use of these keys is deprecated: collate_walltime, collatemem. Instead use collate dictionary and subkey without 'collate' prefix laboratory path: /short/v45/cjs157/mitgcm binary path: /short/v45/cjs157/mitgcm/bin input path: /short/v45/cjs157/mitgcm/input work path: /short/v45/cjs157/mitgcm/work archive path: /short/v45/cjs157/mitgcm/archive Use of these keys is deprecated: collate_walltime, collatemem. Instead use collate dictionary and subkey without 'collate' prefix Loading input manifest: manifests/input.yaml /g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.10/lib/python3.6/site-packages/yamanifest/manifest.py:99: YAMLLoadWarning: calling yaml.load_all() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. self.header, self.data = yaml.load_all(file) Setting up mitgcm Calculated n_timesteps from starttime and endtime /g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.10/lib/python3.6/site-packages/payu/models/mitgcm.py:133: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. restart_info = yaml.load(restart_file) payu : error: Timestep changed to 0.2. New timestep not integer multiple of start time: 3240.0

Clearly the error is false through - since 3240/0.2 is an integer... is this a a rounding issue in the payu logic? (Note that payu accepts a change to a timestep of 1.0 secs).

aidanheerdegen commented 4 years ago

Confirmed this is an error in the integer division here: https://github.com/payu-org/payu/blob/master/payu/models/mitgcm.py#L141 Another example with t_start = 30.0 and dt=0.001:

(Pdb) p t_start
30.0
(Pdb) p dt
0.001
(Pdb) t_start // dt
29999.0
(Pdb) 30.0 // 0.001
29999.0
(Pdb) 30.0 / 0.001
30000.0

In the above case it erroneously claims the timestep is not a divisor of the runtime, when the timestep has not changed!

aekiss commented 4 years ago

Presumably this is because 0.001 is not exactly representable as a float (I expect 0.001 has a recurring binary mantissa that gets rounded in the float representation) so it is actually slightly bigger than 0.001, so // rounds down to 29999...?

aidanheerdegen commented 4 years ago

Sounds plausible

aekiss commented 4 years ago

Python 3.6.8 tells me

>>> 1//0.001
999.0
>>> 1/0.001
1000.0
>>> 1/1024
0.0009765625
>>> 1//0.0009765625
1024.0

ie the representation of 0.001 is slightly big, messing up the floor division //, whereas 1/1024 is exact and the division works as expected. So it will depend on whether you get unlucky with the representability of dt.

It might be best to avoid testing for (in)equality with floats for this reason... it's a can of worms, e.g. https://dl.acm.org/doi/10.1145/103162.103163

aidanheerdegen commented 4 years ago

A proper solution to this would be to use the python decimal library. This is not straightforward as the library used to parse the namelist data does not currently allow for this to be easily used, as values are returned as native python numeric types.

See https://github.com/marshallward/f90nml/issues/115

aekiss commented 4 years ago

This sort of confusion is why I was surprised to hear that MITgcm supports a fractional timestep. In MOM5 it's an integer.