ufs-community / ufs-weather-model

UFS Weather Model
Other
140 stars 247 forks source link

Enable controlling MOM6 ocean restart files with a more flexible approach #1976

Closed BinLiu-NOAA closed 10 months ago

BinLiu-NOAA commented 1 year ago

Description

Currently, when running coupled ufs-weather-model configurations with ocean coupling (e.g. MOM6), users can only control the restart file output interval through, for example, restart_n = 24 and restart_option = nhours, meaning every 24 hours. In addition, MOM6 also write out the restart file at the end of the forecast.

However, for ocean data assimilation (DA) purpose, one would need output the MOM6 restart files at forecast hour 6 (or hours 3, 6, 9), providing first guess for DA.

Meanwhile, it is definitely too expensive (for IO bandwidth/time and disk space) to write out restart files every 6 hours through the restart_n=6 option throughout the forecast length. With that, it would be beneficial to make it somewhat flexible to control the MOM6 ocean restart file output frequency.

Solution

Improve the current method to enable somewhat more flexible method to control the MOM6 ocean restart file output frequency.

Alternatives

N/A.

Related to

DeniseWorthen commented 1 year ago

@BinLiu-NOAA I had some ideas about this after we talked. You wrote "restart files at forecast hour 6 (or hours 3, 6, 9)" , could you clarify this? Basically, you need the restart files 3hrs after initialization, 6hrs after initialization and 9 hours after initialization, right? So if you start at hour 18, you need hours 21,24 and 03 (next day). Is that right?

After that, every 6 hours is needed.

BinLiu-NOAA commented 1 year ago

@DeniseWorthen, ideally we would like to output the MOM6 ocean restart files at forecast hours at 6 (for ocean DA purpose at this point, at forecast hours 3, 6, 9 will be better and required in the future) then forecast hours at 24, 48, 72, 96, 120 (for warm-starting the forecast capability). Thanks!

DeniseWorthen commented 1 year ago

@BinLiu-NOAA I've been able to start prototyping this in CMEPS (alarm initialization is similar in all the non-fv3 component caps). For the case you've listed as 24, 48, 72, 96, 120, would you also need the capability to have interval restarts after hour=120?

BinLiu-NOAA commented 1 year ago

@BinLiu-NOAA I've been able to start prototyping this in CMEPS (alarm initialization is similar in all the non-fv3 component caps). For the case you've listed as 24, 48, 72, 96, 120, would you also need the capability to have interval restarts after hour=120?

Thanks, @DeniseWorthen! In addition to the forecast hour 6 (or 3, 6, 9 hours, or even 3, 4, 5, 6, 7, 8, 9 hours, which are mainly for data assimilation purposes), for the listed forecast hours 24, 48, 72, 96, 120 for restart output files, they are just example forecast hours. It basically means every 24 hours (or 48 hours, or whatever frequency the user/application wants). I believe, MOM6 currently also automatically write out the restart files at the end of the forecast.

DeniseWorthen commented 10 months ago

@BinLiu-NOAA Please try my feature branch in MOM6 (https://github.com/DeniseWorthen/MOM6/tree/feature/restartfh).

To use, add a config variable to your OCN attributes listing the forecast hours you want additional restarts at. For example, the below setting of 3,9,15 will write restarts after 3,9 and 15 hours in addition to the restarts written at set intervals.

# OCN #
OCN_model:                      mom6
  (snip)
  restart_fh = 3,9,15
::

For the RT case starting on 2021-3-22-06 the above setting will write non-interval restarts at

20231221 054819.106 INFO             PET150 MOM_cap:(ModelSetRunClock) Restart_Fh at 2021  3 22  9  0  0   0
20231221 054819.106 INFO             PET150 MOM_cap:(ModelSetRunClock) Restart_Fh at 2021  3 22 15  0  0   0
20231221 054819.106 INFO             PET150 MOM_cap:(ModelSetRunClock) Restart_Fh at 2021  3 22 21  0  0   0

These restarts are in addition to those written out on the interval defined with restart_n and restart_option.

To test, I confirmed that the non-interval restarts are identical to those written out by setting restart_n=3.

BinLiu-NOAA commented 10 months ago

Thanks @DeniseWorthen! @JohnSteffen-NOAA, @binli2337, and @YongzuoLi-NOAA, let's test this new function for HAFS MOM6 coupling and MOM6-3DVAR to confirm it works properly. Thanks!

BinLiu-NOAA commented 10 months ago

Also, @jiandewang for your information about this ongoing work. Thanks!

jiandewang commented 10 months ago

Also, @jiandewang for your information about this ongoing work. Thanks!

yes I saw that. Many thanks for @DeniseWorthen work on this highly demanded features. This feature will also be useful for DA when using S2SW. Let me also make a try using S2SW

YongzuoLi-NOAA commented 10 months ago

Thank you, @DeniseWorthen https://github.com/DeniseWorthen.

We may update /work/noaa/hwrf/save/bliu/hafsv1p1a_2023rt/sorc/hafs_forecast.fd/MOM6-interface/MOM6/config_src/drivers/nuopc_cap/mom_cap.F90 based on https://github.com/DeniseWorthen/MOM6/blob/feature/restartfh/config_src/drivers/nuopc_cap/mom_cap.F90

Yongzuo

On Tue, Dec 26, 2023 at 3:28 PM jiandewang @.***> wrote:

Also, @jiandewang https://github.com/jiandewang for your information about this ongoing work. Thanks!

yes I saw that. Many thanks for @DeniseWorthen https://github.com/DeniseWorthen work on this highly demanded features. This feature will also be useful for DA when using S2SW. Let me also make a try using S2SW

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/issues/1976#issuecomment-1869758014, or unsubscribe https://github.com/notifications/unsubscribe-auth/AT2VRX5KAY4LWRIIVAPFAUDYLMXP7AVCNFSM6AAAAAA64WKD4KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRZG42TQMBRGQ . You are receiving this because you were mentioned.Message ID: @.***>

jiandewang commented 10 months ago

I tried it (with all sorts of different restart file output hours) using one of S2S case as template and confirm that it worked perfectly

DeniseWorthen commented 10 months ago

I'm going to note here (just so I can reference the information at some point), that only real difficulty implementing this was understanding that the config variables could only be accessed as character strings. Google then provided me w/ the method to convert the character string to a comma-delimited integer array.

The limitation as character strings is I believe the result of using, in UFSDriver.F90, NUOPC_CompAttributeIngest to set the component attributes. The NUOPC ref notes that

Important: Attributes ingested by this method are stored as type character strings, and must be accessed accordingly. Conversion from string into a different data type, e.g. integer or real, is the user's responsibility.
DeniseWorthen commented 10 months ago

@BinLiu-NOAA Will you be expecting to have this feature committed to the MOM6 repo for your implementation?

BinLiu-NOAA commented 10 months ago

@BinLiu-NOAA Will you be expecting to have this feature committed to the MOM6 repo for your implementation?

@DeniseWorthen, this feature (flexible MOM6 restart output hours) is not absolutely needed for HAFSv2 upgrade (code freeze end of January, 2024). However, we plan to use this feature in 2024 HAFS real-time parallel experiments (needed in April-June time frame).

With that, if this feature is ready, and the change can be committed back to MOM6 branch used by ufs-weather-model, then it makes sense to me to create PRs and bring in this capability into MOM6 cap and ufs-weather-model (earlier is better of course).

@JohnSteffen-NOAA and @YongzuoLi-NOAA, wondering if you have get a chance to test this capability.

After that, @jiandewang and @DeniseWorthen, feel free to go ahead to plan the PR and commit process. Thanks!

DeniseWorthen commented 10 months ago

@BinLiu-NOAA Thanks for the info. We need to coordinate w/ @jiandewang since I think he has at least one big MOM6 PR that we're waiting to be able to update with before we can push back any changes to GFDL. Jiande, would adding this to the existing changes (the cesm-style names) before pushing back make sense?

jiandewang commented 10 months ago

@BinLiu-NOAA Thanks for the info. We need to coordinate w/ @jiandewang since I think he has at least one big MOM6 PR that we're waiting to be able to update with before we can push back any changes to GFDL. Jiande, would adding this to the existing changes (the cesm-style names) before pushing back make sense?

@DeniseWorthen GFDL has not been able to figure out the cause for the failure of retain b4b on wcoss2 at this moment. Let's see if they ahve any update in today's MOM6 meeting. For your flexiable restart writing code, it make sense to add to the current dev/emc and I will push back to main at certain stage (hard to tell when I shall do that because of the big PR issue on wcoss2). Note I just created a mini PR (https://github.com/NOAA-EMC/MOM6/pull/124) which needs to go into dev/emc.

jiandewang commented 10 months ago

@DeniseWorthen let me asking NCAR side to have a try on your branch to see if they have any comments (in the final bi-wekly meeting of 2023 ecah group shared their idea and comments on how to make MOM6 PR go sommth in year 2024. It is mentioned that NCAR and EMC will have pre-test before initializing its PR)

DeniseWorthen commented 10 months ago

@jiandewang I think if we're going to go ahead and push this, I'd like to make a modification before you send it off to ncar. Could you do a quick test after I make the mod to verify it still compiles and runs?

jiandewang commented 10 months ago

@jiandewang I think if we're going to go ahead and push this, I'd like to make a modification before you send it off to ncar. Could you do a quick test after I make the mod to verify it still compiles and runs?

sure I will do that (HERA is down today but I can try on other machine).

DeniseWorthen commented 10 months ago

@jiandewang Thanks. I pushed the change, which is just to ensure that if the esmf alarm calls returns an error, it will be caught correctly. I've checked that it compiles and it should also work.

jiandewang commented 10 months ago

@DeniseWorthen HERA was totally full yesterday so I ranon GAEA. It works fine. Let me asking NCAR for a test

JohnSteffen-NOAA commented 10 months ago

@BinLiu-NOAA @YongzuoLi-NOAA @DeniseWorthen

I was able to test the user-specified MOM6 restart capability within the HAFS framework and it works as expected.

The test used the HAFSv2 baseline branch and substituted Denise's MOM6 fork of feature/restartfh before building and running HAFS.

The ufs.configure.mom6.tmp file in the /parm/forecast/regional/ directory was modified to include the ocean attribute "restart_fh = 3,6,9,24".

The cronjob_hafsv2a_baseline.sh was modified to run 27-hour forecasts of the 13L Laura test case for two cycles, 2020082506 and 2020082512.

Output on Orion can be found here: /work/noaa/hwrf/scrub/jsteff/hafsv2a_baseline_restartfh/2020082512/13L/forecast/RESTART

20200825.150000.MOM.res.nc 20200825.210000.MOM.res.nc 20200825.180000.MOM.res.nc 20200826.120000.MOM.res.nc

YongzuoLi-NOAA commented 10 months ago

Thank you, John. I will test it.

Yongzuo

On Wed, Jan 10, 2024 at 5:03 PM JohnSteffen-NOAA @.***> wrote:

@BinLiu-NOAA https://github.com/BinLiu-NOAA @YongzuoLi-NOAA https://github.com/YongzuoLi-NOAA @DeniseWorthen https://github.com/DeniseWorthen

I was able to test the user-specified MOM6 restart capability within the HAFS framework and it works as expected.

The test used the HAFSv2 baseline branch and substituted Denise's MOM6 fork of feature/restartfh before building and running HAFS.

The ufs.configure.mom6.tmp file in the /parm/forecast/regional/ directory was modified to include the ocean attribute "restart_fh = 3,6,9,24".

The cronjob_hafsv2a_baseline.sh was modified to run 27-hour forecasts of the 13L Laura test case for two cycles, 2020082506 and 2020082512.

Output on Orion can be found here:

/work/noaa/hwrf/scrub/jsteff/hafsv2a_baseline_restartfh/2020082512/13L/forecast/RESTART

20200825.150000.MOM.res.nc 20200825.210000.MOM.res.nc 20200825.180000.MOM.res.nc 20200826.120000.MOM.res.nc

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/issues/1976#issuecomment-1885810231, or unsubscribe https://github.com/notifications/unsubscribe-auth/AT2VRX33MFAVLJD4FXPIGRTYN4F3NAVCNFSM6AAAAAA64WKD4KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBVHAYTAMRTGE . You are receiving this because you were mentioned.Message ID: @.***>

BinLiu-NOAA commented 10 months ago

Thanks, @JohnSteffen-NOAA! Great to know it works fine within the HAFS application/workflow properly as well.

With that, @DeniseWorthen and @jiandewang, please feel free to help to plan/coordinate the merge of this development back so that ufs-weather-model develop branch can use this feature. Much appreciated!

DeniseWorthen commented 10 months ago

A question came up in conversation w/ NCAR about MOM6 history files. How (if at all) are you using MOM6 history files when this feature is active? That is, are the history files correctly averaged when restarting at an arbitrary hour?

jiandewang commented 10 months ago

copy and paste NCAR's comments here for record:

Our tests are passing.I also confirmed that the restart files don't get recorded if the intervals are not aligned with the coupling timestep. So perhaps a warning/error may be added to avoid user confusion.

jiandewang commented 10 months ago

@DeniseWorthen will you be able to add a warning message Alper mentioned here ? Will this work ? if ( mod(restart_fh, coupling_timesetp) /= 0 ) .........

DeniseWorthen commented 10 months ago

Yes, I can obtain the coupling_timestep from the clock.

jiandewang commented 10 months ago

Yes, I can obtain the coupling_timestep from the clock.

use force push to avoid extra git commit history (that's what MOM group prefer)

BinLiu-NOAA commented 10 months ago

@DeniseWorthen and @jiandewang, for HAFS FV3ATM-MOM6 coupling, our coupling time step is 6 mins (currently), and we definitely make sure the output history and restart files are divisible by 6 mins. Meanwhile, for HAFS MOM6 history output, we also choose instantaneous fields instead of time-averaged fields. Hope these information might be useful. Thanks!

DeniseWorthen commented 10 months ago

@jiandewang I need to re-think my respond to Alper's comment about cases where this would not correctly add the extra restarts. We actually control restarts via restart_n and restart_option. You can set those to values which are not multiples of the coupling frequency already---for example, write a MOM6 restart every 90mins (restart_n=90,restart_option=nminutes) when you couple on the hour. So really you could argue that we need a warning message in that case too.

In this case, I've hard-coded the restart_fh to be in hours, so maybe a message makes sense. But you can also imagine a case (with either restart_n or restart_fh) where you set them such that MOM6 hasn't completed it's baroclinic/barotropic timesteps. We've always ensured everything aligns (including span_coupling false), but it is really up to the user to know how to set up their case.

DeniseWorthen commented 10 months ago

@jiandewang I've updated the message two ways; I've written it to the stdout log instead of the PET log (since PET logs are most likely off) and added a note for when the extra ones won't be written. I think this is good to go now.

jiandewang commented 10 months ago

@DeniseWorthen let me make a test run and get back to you

jiandewang commented 10 months ago

@DeniseWorthen works as expected, now I see the follwoing in the "out" file if I set restart_fh = 8,9,10,11,12,20 150: (MOM_cap:ModelAdvance) writing restart file 20210322.140000.MOM.res 150: (MOM_cap:ModelAdvance) writing restart file 20210322.150000.MOM.res 150: (MOM_cap:ModelAdvance) writing restart file 20210322.160000.MOM.res 150: (MOM_cap:ModelAdvance) writing restart file 20210322.170000.MOM.res 150: (MOM_cap:ModelAdvance) writing restart file 20210322.180000.MOM.res 150: (MOM_cap:ModelAdvance) writing restart file 20210323.020000.MOM.res 150: (MOM_cap:ModelAdvance) writing restart file 20210323.060000.MOM.res

YongzuoLi-NOAA commented 10 months ago

I have tested "restart_fh = 3,6,9,24". in ufs.configure.mom6.tmp (1) Restart files were created as expected (2) 5 small restart files (MOM6 writes netcdf up to 2GB) were combined by ncks to create one 20GB MOM.res.nc. (3) 20GB MOM.res.nc is used as HAFS-JEDI MOM6-3DVAR input well (4) 20GB MOM.res.nc may be used as HAFS MOM6 warm start input.

Thanks

Yongzuo

On Fri, Jan 12, 2024 at 1:36 PM jiandewang @.***> wrote:

@DeniseWorthen https://github.com/DeniseWorthen works as expected, now I see the follwoing in the "out" file if I set restart_fh = 8,9,10,11,12,20 150: (MOM_cap:ModelAdvance) writing restart file 20210322.140000.MOM.res 150: (MOM_cap:ModelAdvance) writing restart file 20210322.150000.MOM.res 150: (MOM_cap:ModelAdvance) writing restart file 20210322.160000.MOM.res 150: (MOM_cap:ModelAdvance) writing restart file 20210322.170000.MOM.res 150: (MOM_cap:ModelAdvance) writing restart file 20210322.180000.MOM.res 150: (MOM_cap:ModelAdvance) writing restart file 20210323.020000.MOM.res 150: (MOM_cap:ModelAdvance) writing restart file 20210323.060000.MOM.res

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/issues/1976#issuecomment-1889776145, or unsubscribe https://github.com/notifications/unsubscribe-auth/AT2VRXYVDRNJ2BOJ7WIPCSDYOF7DLAVCNFSM6AAAAAA64WKD4KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBZG43TMMJUGU . You are receiving this because you were mentioned.Message ID: @.***>

YongzuoLi-NOAA commented 10 months ago

I will perform HAFS-JEDI MOM6-3DVAR cycles with restart_fh.

jiandewang commented 10 months ago

@YongzuoLi-NOAA why do you need to combine small files to a big file as MOM6 can read in small files ? also 5x2=10G. I assume combination will takes time as the file size is big

YongzuoLi-NOAA commented 10 months ago

@jiandewang Thanks for letting me know that MOM6 can read multiple MOM.res files as input. HAFS-JEDI MOM6-3DVAR read one MOM.res.nc before. I am not sure how to write yaml for multiple MOM.res files. Here is JEDI 3DVAR yaml background: read_from_file: 1 basename: ./restarts/ ocn_filename: MOM.res.nc date: DATE state variables: [hocn, socn, tocn, ssh, mld, layer_depth]

jiandewang commented 10 months ago

@YongzuoLi-NOAA I see and I guess you have to combine them for your case

YongzuoLi-NOAA commented 10 months ago

@jiandewang Thank you for the discussion.