ufs-community / ufs-weather-model

UFS Weather Model
Other
136 stars 244 forks source link

Enable cmeps to use PIO+PNETCDF for IO in UFS #2347

Open DeniseWorthen opened 3 months ago

DeniseWorthen commented 3 months ago

Description

Currently CMEPS in UFS does not make use of PIO options. Restart (and history) writing is through serial netcdf. CMEPS has an existing capability to write using PIO+pnetcdf, with control of the various PIO options (eg. stride, numiotasks) through configuration.

Solution

Parallel writes for CMEPS should be implemented in UFS through setting the appropriate PIO config options. Scalability testing should be done to determine correct values for the PIO settings.

Alternatives

Related to

See https://github.com/oceanmodeling/CMEPS/issues/1 for an example of this issue arising in the coastal modeling effort.

uturuncoglu commented 3 months ago

@DeniseWorthen I think you mean that the PIO options needs to be added to ufs template files. Right? I just want to clarify. The capability to use different options for PIO is already implemented in CMEPS and CDEPS.

DeniseWorthen commented 3 months ago

Yes, exactly. I will clarify the issue description.

DeniseWorthen commented 3 months ago

I set up an ATM-OCN-ICE case (C384, 1/4deg) on Gaea-C5. I turned off all history and restart-writing except for CMEPS. To do this for OCN and ICE, I manually over-rode the write-restart logicals in the codes and set them false prior to compiling. I removed the WGC for the ATM and used a layout of 16x24 and did not use threading for the ATM. This gave me 2304 PEs as a max for CMEPS. I made a series of 24 hour runs, with mediator restarts at 3 hour intervals, giving a total of 8 mediator restart writes. I recorded the min/max and mean times for the med_phase_restart_write in the ESMF Profile Summary log.

Using the config variables in ufs.configure, I did 3 sets of runs using 300,600,1200 or 2300 PEs for CMEPS. I set the pio_type to pnetcdf for all runs. One set of runs allowed CMEPS to set all the PIO associated parameters, one set I manually set the numio tasks to yield a stride=4 and a final set I set both numio and stride according to whether the PE count was > or < 1000 (see med_io_mod).

For the existing configuration, serial netcdf is used by default. This provides a mean write time for each CMEPS restart of ~2.4s. Using pnetcdf+PIO, best results were found using the subset rearranger at stride=4. Depending on the number of tasks, this results in each CMEPS restart time between ~0.8 and 0.5s for each restart write. See full results here

junwang-noaa commented 3 months ago

Denise, thanks for testing the new parallel writing in CMEPS, the speedup is great (>60%). It might be good to test the feature in higher resolution runs (C768 and C1152). I recall we have problems to use a large number of tasks for CMEPS.

DeniseWorthen commented 3 months ago

@junwang-noaa I could test the higher ATM cases, all I need is the ATM input and the layouts to try.

junwang-noaa commented 3 months ago

@DusanJovic-NOAA do you have C768/C1112 ATM only test cases (run directories) generated from G-W?

DusanJovic-NOAA commented 3 months ago

@DusanJovic-NOAA do you have C768/C1112 ATM only test cases (run directories) generated from G-W?

I have them on wcoss2 here:

/lfs/h2/emc/eib/noscrub/dusan.jovic/ufs/c1152_gw_case/ /lfs/h2/emc/eib/noscrub/dusan.jovic/ufs/c768_gw_case/

DeniseWorthen commented 3 months ago

I've grabbed these now and will set up some more testing for CMEPS PIO options. It looks like in these were used to test blocksize changes. I'm assuming I should stick w/ the blocksize=32 settings, right?

DusanJovic-NOAA commented 3 months ago

I've grabbed these now and will set up some more testing for CMEPS PIO options. It looks like in these were used to test blocksize changes. I'm assuming I should stick w/ the blocksize=32 settings, right?

Yes.

DeniseWorthen commented 3 months ago

Nothing is moving on Gaea today, but I've been testing adding the config variables to the RT templates. On hercules, it appears that for small PE counts, like in the cpld_control test (CMEPS=144 PEs), using serial netcdf is actually faster than pnetcdf. So I plan on doing some more tests on Gaea at the C384 resolution also using fewer and fewer CMEPS PEs, to see if I can identify the point at which pnetcdf starts to pay off.

DeniseWorthen commented 2 months ago

I've been able to get the c768 ATM only case running on Gaea but it is failing at about hour 21. See /gpfs/f5/nggps_emc/scratch/Denise.Worthen/cmepspio768/test.atmonly

I'm not sure why it's failing. I compiled on gaea and used the job-card from the low-res RT case, modifying for the task count. All the fix files are pointing to G-W fix file locations on Gaea. I'm seeing

1303: forrtl: error (78): process killed (SIGTERM)
1303: Image              PC                Routine            Line        Source
1303: libpthread-2.31.s  00007F842F290910  Unknown               Unknown  Unknown
1303: libpthread-2.31.s  00007F842F28B70C  pthread_cond_wait     Unknown  Unknown
1303: fv3.exe            0000000000C8A9B4  Unknown               Unknown  Unknown
1303: fv3.exe            0000000000C8BC29  Unknown               Unknown  Unknown
1303: fv3.exe            0000000000F70450  Unknown               Unknown  Unknown
1303: fv3.exe            00000000009FEEFE  Unknown               Unknown  Unknown
1303: fv3.exe            000000000071E971  Unknown               Unknown  Unknown
1303: fv3.exe            0000000001AC3A12  fv3atm_cap_mod_mp        1077  fv3_cap.F90
1303: fv3.exe            0000000001AC346B  fv3atm_cap_mod_mp        1026  fv3_cap.F90
1303: fv3.exe            0000000000CF36A8  Unknown               Unknown  Unknown

EDIT: Now I see that it was a time-out.

junwang-noaa commented 1 month ago

@DeniseWorthen Can you confirm that the c768 ATM test still failed on gaea? Can you list the changes to turn on PIO_Pnetcdf in CMPES so that it can be tested on wcoss2?

DeniseWorthen commented 1 month ago

@junwang-noaa I haven't tried the c768 case recently. What I really need is a canned case for the coupled model that runs on Gaea---I was trying to modify the standalone case.

To turn on PnetCDF for CMEPS, add to the ufs.configure in the MED_attributes.

MED_attributes::
....
      pio_rearranger = subset
      pio_typename = pnetcdf
      pio_stride = 4
....

This will create as many io tasks as possible, assuming they are laid out at a stride of 4 across the available processors.