ufs-community / ufs-mrweather-app

UFS Medium-Range Weather Application
Other
23 stars 23 forks source link

Set up CIME for all four supported resolutions #40

Closed rsdunlapiv closed 4 years ago

rsdunlapiv commented 4 years ago
uturuncoglu commented 4 years ago

I think that config_machine.xml also need to be cleaned. We need to keep only supported platforms for UFS.

uturuncoglu commented 4 years ago

@arunchawla-NOAA @ligiabernardet @GeorgeGayno-NOAA I am trying to run and test the model with different resolutions and i need following information

In this case, i could use chgres to create input for different resolutions.

ligiabernardet commented 4 years ago

I do not have this information. I hope others can chime in.

On Thu, Dec 26, 2019 at 2:15 PM Ufuk Turunçoğlu notifications@github.com wrote:

@arunchawla-NOAA https://github.com/arunchawla-NOAA @ligiabernardet https://github.com/ligiabernardet @GeorgeGayno-NOAA https://github.com/GeorgeGayno-NOAA I am trying to run and test the model with different resolutions and i need following information

  • Used number of processor for each case. I know C96 uses 150 by default but what about others.
  • Namelist changes (input.nml, model_configure, pre- and post- also if there are) based on the resolutions.

In this case, i could use chgres to create input for different resolutions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/40?email_source=notifications&email_token=AE7WQAQDQ7FCQSIQW5E6AFDQ2UNIFA5CNFSM4J6DMA32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHWE56Y#issuecomment-569134843, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7WQAUDR4ASIOEQBNLPNL3Q2UNIFANCNFSM4J6DMA3Q .

uturuncoglu commented 4 years ago

@ligiabernardet Thanks. I found some information in the following file for C768

https://github.com/NOAA-EMC/fv3gfs/blob/master/scripts/exglobal_fcst_nemsfv3gfs.sh

i am not sure those are valid for the current version of FV3 or not. Anyway, i'll try those options but it would be nice to have information related with the namelist changes (number of io task, dt, other physics options etc.) to test the model with different configurations.

uturuncoglu commented 4 years ago

@arunchawla-NOAA @climbfuji @DusanJovic-NOAA @junwang-noaa The model is failed with the options that i found in exglobal_fcst_nemsfv3gfs.sh. It would be great if we have a list of namelist options for different resolutions and CCPP v15p2 and v16beta combinations. Currently, i could not test the model for different resolutions.

arunchawla-NOAA commented 4 years ago

@KateFriedman-NOAA can you provide the namelist options that we use (input.nml, model_configure) for the different grid resolutions in the global workflow @GeorgeGayno-NOAA and @WenMeng-NOAA are there namelist options for chgres and UPP that change with resolution? If yes then can you provide examples to @rsdunlapiv @uturuncoglu and @jedwards4b so that they can set it up for CIME

ligiabernardet commented 4 years ago

We also need to know how the stochastic options vary with resolution. Tks

On Mon, Jan 6, 2020 at 12:58 PM arun chawla notifications@github.com wrote:

@KateFriedman-NOAA https://github.com/KateFriedman-NOAA can you provide the namelist options that we use (input.nml, model_configure) for the different grid resolutions in the global workflow @GeorgeGayno-NOAA https://github.com/GeorgeGayno-NOAA and @WenMeng-NOAA https://github.com/WenMeng-NOAA are there namelist options for chgres and UPP that change with resolution? If yes then can you provide examples to @rsdunlapiv https://github.com/rsdunlapiv @uturuncoglu https://github.com/uturuncoglu and @jedwards4b https://github.com/jedwards4b so that they can set it up for CIME

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/40?email_source=notifications&email_token=AE7WQAXPWQ2DSOZDDUOK723Q4OEN7A5CNFSM4J6DMA32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIGTEHQ#issuecomment-571290142, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7WQAVBR72MOWGRKTAP7EDQ4OEN7ANCNFSM4J6DMA3Q .

uturuncoglu commented 4 years ago

@ligiabernardet Do we need to keep stochastic seed options constant when we restart the model? How does are handled by the model? What are the option to enable or disable stochastic physics?

ligiabernardet commented 4 years ago

Pls direct stochastic questions to @pjpegion, but keep me in the loop so I can write the documentation. I am not familiar with the details of restarting with stochastic physics. Phil's draft documentation is at [https://stochastic-physics.readthedocs.io/en/ufs_public_release/]. My understanding is that it can be disabled with: do_sppt = .F. do_shum = .F. do_skeb = .F. do_sfcperts = .F.

GeorgeGayno-NOAA commented 4 years ago

@KateFriedman-NOAA can you provide the namelist options that we use (input.nml, model_configure) for the different grid resolutions in the global workflow @GeorgeGayno-NOAA and @WenMeng-NOAA are there namelist options for chgres and UPP that change with resolution? If yes then can you provide examples to @rsdunlapiv @uturuncoglu and @jedwards4b so that they can set it up for CIME

For chgres, the target or FV3 grid is set by these namelist options:

I believe @KateFriedman-NOAA has provided all required files.

pjpegion commented 4 years ago

@uturuncoglu there are no specific changes needed for stochastic physics when changing resolutions. If you want a bitewise reproducible restart of forecast that includes stochastic physics, then you need to set FHSTOCH to the desired time (in forecast hours) that you want to write out the stochastic physics restart (There is an update in master that allows for the stochastic physics restart to be written out each time the atmospheric model's restart is written out). This will generate a file stoch_out.F (where HHH is the 3-digit forecast hour). When restarting this file needs to be renamed stoch_ini, and the namelist option stochini gets set to .true.

rsdunlapiv commented 4 years ago

@KateFriedman-NOAA we still need stable input.nml and model_configure files for each of the supported resolutions and physics combinations.

KateFriedman-NOAA commented 4 years ago

@KateFriedman-NOAA we still need stable input.nml and model_configure files for each of the supported resolutions and physics combinations.

Questions to help me prep the namelist files: 1) Current operational namelist settings? Or current dev GFSv16 settings? 2) Where to post them? Here?

I will start assembling the ops namelists and adjust if v16 is needed.

ligiabernardet commented 4 years ago

Fanglin provided DTC with namelists for GFSv15p2 and GFSv16beta for the C768L64 configuration (those are the supported suites for this release). This is what we have been documenting and testing so far, and what we handed to CIME folks. The main question is whether/how they should be changed with with resolution. https://docs.google.com/document/d/1K-n25HickouGz1wya6b4XeYUzJzQV6EMpEfiPF8Er5w/edit https://docs.google.com/document/d/1qUT2IWmKMa64FRQKV6ut0meAG54HaaTo0nyi1hvMYzg/edit

On Tue, Jan 7, 2020 at 7:42 AM Kate Friedman notifications@github.com wrote:

@KateFriedman-NOAA https://github.com/KateFriedman-NOAA we still need stable input.nml and model_configure files for each of the supported resolutions and physics combinations.

Questions to help me prep the namelist files:

  1. Current operational namelist settings? Or current dev GFSv16 settings?
  2. Where to post them? Here?

I will start assembling the ops namelists and adjust if v16 is needed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/40?email_source=notifications&email_token=AE7WQAVZRTLGQ3QJGTIIJM3Q4SIFRA5CNFSM4J6DMA32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIJCRGQ#issuecomment-571615386, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7WQAT6EBLFYS2LSYTWVLTQ4SIFRANCNFSM4J6DMA3Q .

KateFriedman-NOAA commented 4 years ago

Gotcha, I can provide the changes for resolution based on those provided namelists from Fanglin, thanks! I can note them in those two docs if you like (would need edit permissions). I'll collect them separately for now.

KateFriedman-NOAA commented 4 years ago

I made copies of those two docs (links below) and added values for the variables that change with resolution (as seen in the FV3GFS configs and scripts). Any variable that has a different value based on resolution is labeled like this:

A [B] [C] [D]

...where A is the C768 value (in black), B is the C384 value (in pink), C is the C192 value (in purple), and D is the C96 value (in orange). Also, if a value is easily calculable, I include that calculation in grey.

GFS v15.2 - https://docs.google.com/document/d/1EKc2mAld5VsrNjTRgqUcTVG1ZcEIkllA-NrAKUs4DWI/edit?usp=sharing GFS v16 - https://docs.google.com/document/d/1bLbVdWgEIknDQZgTuOZ6IPVEGv5jUgOrCm4GrR96oBU/edit?usp=sharing

Let me know if additional info is needed.

uturuncoglu commented 4 years ago

I could access original documents shared by @ligiabernardet but not the ones of @KateFriedman-NOAA.

KateFriedman-NOAA commented 4 years ago

@uturuncoglu Do these work?

https://docs.google.com/document/d/1EKc2mAld5VsrNjTRgqUcTVG1ZcEIkllA-NrAKUs4DWI/edit?usp=sharing https://docs.google.com/document/d/1bLbVdWgEIknDQZgTuOZ6IPVEGv5jUgOrCm4GrR96oBU/edit?usp=sharing

uturuncoglu commented 4 years ago

@KateFriedman-NOAA Thanks, now i could access them.

KateFriedman-NOAA commented 4 years ago

Sweet, I have updated the earlier links.

uturuncoglu commented 4 years ago

BTW, why layout changes between CCPP versions for same resolution?

uturuncoglu commented 4 years ago

(There is an update in master that allows for the stochastic physics restart to be written out each time the atmospheric model's restart is written out).

@pjpegion Is this available in the current version of FV3 ufs_release branch.

pjpegion commented 4 years ago

Not yet. I can merge it in tomorrow.

Sent from my iPhone

On Jan 8, 2020, at 10:56 AM, Ufuk Turunçoğlu notifications@github.com wrote:

 @pjpegion > (There is an update in master that allows for the stochastic physics restart to be written out each time the atmospheric model's restart is written out). Is this available in the current version of FV3 ufs_release branch.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

uturuncoglu commented 4 years ago

@KateFriedman-NOAA I got mail related with the global-workflow: updates for GFSv15.2.7. If you don't mind could you also put input files for 2020 (i.e. global_co2historicaldata_2020.txt) to the FTP.

KateFriedman-NOAA commented 4 years ago

@uturuncoglu sure...just the 2020 file? These files were updated in the year-end CO2 file updates:

.../fix_am/co2dat_4a/global_co2historicaldata_2018.txt .../fix_am/co2dat_4a/global_co2historicaldata_2019.txt_proj_u .../fix_am/co2dat_4a/global_co2historicaldata_2020.txt_proj .../fix_am/fix_co2_proj/global_co2historicaldata_2020.txt .../fix_am/fix_co2_update/global_co2historicaldata_2019.txt

I see that Fanglin put the 2014 to 2019 files directly under the fix_am folder...are you expecting the 2020 file there too? There is already the projected file under fix_am.../fix_co2_proj:

fix_am.v20191213/fix_co2_proj/global_co2historicaldata_2020.txt

uturuncoglu commented 4 years ago

Yes. I think that i have others. I am getting globalco2historicaldata*.txt files from global/fix/fix_am.v20191213/fix_co2_proj and rest of them are from global/fix/fix_am.v20191213/. If any file updated, please put them to the FTP and we would have a consistent set of files. BTW, i am not sure about the versioning of the folders. Do you need to create another folder with different date that has all the files? If so, i need to change the datestamp in CIME side.

KateFriedman-NOAA commented 4 years ago

Ok I should have reread my emails from mid-December earlier...these new CO2 files are already in the set on the ftp server and are up-to-date. We got these files in early December and I copied them into our main FIX_DIR set right before Fanglin made that v20191213 set for the UFS release. I did some quick diffs to double check, they are indeed already up-to-date.

rsdunlapiv commented 4 years ago

@uturuncoglu could you please post an update as to whether the namelist changed for each resolution provided by @KateFriedman-NOAA are working for you in CIME?

@KateFriedman-NOAA there was a question from @uturuncoglu about whether the atmosphere layout should change between v15.2 and v16 versions of physics. Can you please confirm that this should be the case?

KateFriedman-NOAA commented 4 years ago

@uturuncoglu @rsdunlapiv I am not familiar with CCPP (haven't worked with it yet) so this is a question for Fanglin Yang and Judy Henderson (I can't tag them in here for some reason).

uturuncoglu commented 4 years ago

@rsdunlapiv I am still working on CIME side to make restart working properly for regular runs and also test. I could not find time to test other resolution yet but i have already modified namelist XML file and fins suitable layout, write group etc. configuration for Cheyenne which has 36 core in each node.

pjpegion commented 4 years ago

@uturuncoglu I just merged the stochastic_physics master into ufs_public_release. The stochastic physics random patterns needed for restarting the model should now be written out each restart time. What you need to do at the namelist level is set FHSTOCH to the restart interval.

climbfuji commented 4 years ago

I will update the submodule pointer to stochastic_physics in my upcoming PR to the ufs_public_release branch of the ufs-weather-model.

uturuncoglu commented 4 years ago

@KateFriedman-NOAA @ligiabernardet @pjpegion I tested different resolutions on Cheyenne and i made some changes in the processor count to fit the run on 36 core nodes. The results are follows,

layout write_groups write_tasks_per_group total pe result
C96 4x4 1 12 108 working
C192 4x6 1 36 180 working
C384 6x6 1 36 252 working
C768 12x8 3 36 648 fails
C768 16x16 3 36 1644 fails

I have problem with C768 case and it fails in both test and i the log file i have only

165: calculating slp kr value
176: calculating slp kr value
166: calculating slp kr value
178: calculating slp kr value
177: calculating slp kr value
167: calculating slp kr value
MPT: shepherd terminated: r9i2n33.ib0.cheyenne.ucar.edu - job aborting

All the test are done without threading at this point. I could try to increase number of core more for C768 case but if you have any other suggestion just let me know.

uturuncoglu commented 4 years ago

@KateFriedman-NOAA @ligiabernardet @pjpegion Now i am trying to increase IO pool from 3x36 to 7x36. Then, if it fails, i'll double number of processor used.

uturuncoglu commented 4 years ago

I could run C768 with threading support. It seems it was related with memory issue. Following configuration works fine for Cheyenne,

layout = 12,8
write_groups = 3
write_tasks_per_group = 36
atmos_nthreads = 2
rsdunlapiv commented 4 years ago

All four resolutions are now running. C768 requires threading. @uturuncoglu will clean up logic in buildnml to set PE counts based only on the resolution. Error checking needs to be added to ensure that total PE count for the atmosphere is consistent with layout + write task settings in user_nl_ufsatm.

@climbfuji what resolutions are expected to work on a Mac laptop?

climbfuji commented 4 years ago

On a Mac, I would only want to run C96. I've tried running two C96 setups in parallel, and this drained the resources on my 16GB RAM machine, which makes me assume that C192 won't work. But users owning a Mac Pro (the development power station) will be able to run C192 for sure.

climbfuji commented 4 years ago

BTW I don't understand why C768 works only with threading turned on on Cheyenne, this seems to be suspicious to me.

jedwards4b commented 4 years ago

I suspect that it runs out of memory, threading reduces the memory required per node. I can run additional tests to confirm.

On Mon, Jan 13, 2020, 16:20 Dom Heinzeller notifications@github.com wrote:

BTW I don't understand why C768 works only with threading turned on on Cheyenne, this seems to be suspicious to me.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/40?email_source=notifications&email_token=ABOXUGHPZCX7YFMXD3UAOQ3Q5TZLTA5CNFSM4J6DMA32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI2VMLA#issuecomment-573920812, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOXUGHTTYZLL3VOUSCUIQLQ5TZLTANCNFSM4J6DMA3Q .

climbfuji commented 4 years ago

I suspect that it runs out of memory, threading reduces the memory required per node. I can run additional tests to confirm.

Got it, this makes sense.

arunchawla-NOAA commented 4 years ago

@rsdunlapiv can this ticket be closed?

jedwards4b commented 4 years ago

Done