payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
21 stars 27 forks source link

Replace CICE start date calculations #484

Closed blimlim closed 3 months ago

blimlim commented 3 months ago

This PR covers the first part of #466. It tries to simplify the setting of CICE's start date in ACCESS-ESM1.5 simulations, in order to avoid inconsistency bugs and to make the restart timing more transparent to the user.

The previous calculations for setting the cice start date used:

control_dir/ice/input_ice.nml
--------------------------------
&coupling
init_date=00010101 (control_init_date)
...

and

restart_dir/ice/input_ice.nml
--------------------------------
&coupling
runtime0=3155673600 (res_runtime0)
runtime=0 (res_runtime)
/

in order to set the start date of a new simulation, payu would calculate a start date as inidate = control_init_date + res_runtime0 + res_runtime (with required unit conversions).

It would then populate the work directory copy of input_ice.nml with values for the model to use for the new simulation:

work_dir/ice/input_ice.nml
--------------------------------
&coupling
inidate = control_init_date + res_runtime0 + res_runtime
init_date = control_init_date
runtime0= res_runtime0 + res_runtime
runtime= run duration calculated from config.yaml
/

This PR replaces restart_dir/ice/input_ice.nml with a new file restart_dir/ice/restart_date.nml containing:

restart_dir/ice/restart_date.nml
---------------------------------
&coupling
init_date=10101 (res_init_date)
inidate=1010101 (res_inidate)
/

During setup, payu then fills the work directory copy of input_ice.nml with:

work_dir/ice/input_ice.nml
--------------------------------
&coupling
inidate = res_inidate
init_date = res_init_date
runtime0 = res_inidate - res_init_date (in seconds)
runtime = run duration calculated from config.yaml
/

My goal here was to ensure that cice receives the exact same values of init_date, inidate, runtime, runtime0 with the new calculations that it would have received with the old calculations, which is why the updates use the init_date and calculate the number of seconds between it and the inidate. It would be simpler to just set both init_date and inidate to the new start date, and set runtime0=0, however this would give different information to cice.

For some reason I thought it would be good to read the init_date from the new restart file (I forget why...), though this could be moved back to reading it from the control directory if everyone thinks it is cleaner.

Backwards compatibility will be broken with this PR, and if merged, we will have to add restart_date.nml files to the vk83 restart directories.

Future work would making a separate function for reading the start date, which could then be used for consistency checks between the different component's start dates, though I'll defer that to a future PR/issue.


For the review, any suggestions on structure, clarity, and also correctness of the calculations would be very welcome!

pep8speaks commented 3 months ago

Hello @blimlim! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 272:66: W291 trailing whitespace

Comment last updated at 2024-08-21 06:30:16 UTC
coveralls commented 3 months ago

Coverage Status

coverage: 56.425% (+3.9%) from 52.565% when pulling c4a1430f0b7604bfc8d8257cf040011e9779b744 on ACCESS-NRI:466-esm1p5-cice-startdate-fix into 39b9ba457f4ccee808977c1c28e419a6febac5ae on payu-org:master.

blimlim commented 3 months ago

Oh my bad. I'll fix these PEP issues and ping everyone when done

blimlim commented 3 months ago

input_ice has lots of fields unrelated to restarts in it, so i'm not sure why we are adding another file? Whats the purpose in keeping the timing information on the completed run seperately?

Previously payu had been copying the whole input_ice.nml into the restart directory, but only the runtime0 and runtime fields from this copy were required. Meanwhile the inidate and init_date were being ignored, along with all the other timing related fields. The main idea behind the separate restart_date.nml file is to try and simplify this, so that the new file holds all the information needed for the run timing, and none of the unnecessary extra fields. Let me know if this approach sounds ok!

Related to this – with the changes, the input_ice.nml was not meant to be copied to the restart directory anymore as it isn't needed. I'd forgotten to implement this change, but have added it in to the latest commit.

to runtime0 = res_inidate - res_init_date (in seconds) so that the user can change res_inidate if they desire (this is the date the current run starts at)?.

Unfortunately the restart date is also held by the binary iced... restart files, and it needs to be consistent with the inidate. This means that changing res_inidate won't be enough by itself to change to a completely new restart date. It will make it easier though to create a simulation from a CSIRO restart which currently requires the runtime0+runtime calculation to be performed if done manually. Another reason behind the proposed change is to try and make it clearer which date cice will be starting at (e.g. so people don't need to calculate `control_init_date + res_runtime0 + res_runtime').

This is something to do with the warm-start script That's a good point. If these changes go through, the warm start scripts would need to be updated down the line for them to work with the changed calculations.

I'm not really sure why this isn't the only change needed actually ?

I think it would work for the only change to be for the init_date to be read from the restart directory instead of the control directory. Though it might be a bit less transparent which date the model will restart at. I.e. this would require runtime_0 + runtime + init_date calculation. Happy to hear from everyone if either approach is preferable!

blimlim commented 3 months ago

I'd really really really recommend adding some model driver tests, even just a minimal set that covers the expected behaviour of this code.

Good idea. I've had a go at adding some tests, first of the individual calendar functions used in the calculations, and then a bigger test of the actual driver: https://github.com/ACCESS-NRI/payu/blob/8f18f245787cafa940a9c397b5173479a40b6273/test/models/test_access.py#L178-L242

The driver test cycles over the access.setup() and access.archive() steps (currently 1000 times – perhaps excessive) with a year long runtime, and checks that the final end date matches what's expected. It's admittedly a bit slow (~22s for the 1000 cycles), but would be keen to hear if this sounds like a reasonable type of test to ensure that the calculations are working correctly across the different tricky years that can come up?

anton-seaice commented 3 months ago

I ran this and it works with the dates look correct for the first couple of months of picontrol. I didn't test trying to change the start date or what happens with a leap year.

anton-seaice commented 3 months ago

The tests run and pass as-is, so if you want to defer the suggested changes to the tests to after release, that would be ok

blimlim commented 3 months ago

Thanks for the reviews! Lots of good ideas which help make the changes clearer and also improve the tests. I've added in the different suggestions, and also added an additional test of the access.py driver.

https://github.com/ACCESS-NRI/payu/blob/c19133c83109e768236486b0879d09e6f05c3be2/test/models/test_access.py#L261-L269

The previous test which cycled over the setup and archive stages would not catch cases where calculations in both steps were simultaneously wrong (e.g. if they somehow both used the wrong calendar). The new test focuses on the setup date, checking that the correct runtime is written to the namelists when starting at a range of leap/non-leap years. I think there's some redundancy with this and the more focused tests of calendar.py but think it's still helpful to have to make sure that everything comes together correctly in the driver.

anton-seaice commented 3 months ago

This looks good @blimlim - I just had some tidyups on the testing in the suggestions :)

blimlim commented 3 months ago

Awesome thanks for those suggestions – very helpful for making the code clearer and improving the tests! I've added those in and let me know if there are any other suggestions.

blimlim commented 3 months ago

@aidanheerdegen are you happy for this one to be merged, or did you have any other suggestions?