Open DeniseWorthen opened 3 months ago
@DeniseWorthen I think you mean that the PIO options needs to be added to ufs template files. Right? I just want to clarify. The capability to use different options for PIO is already implemented in CMEPS and CDEPS.
Yes, exactly. I will clarify the issue description.
I set up an ATM-OCN-ICE case (C384, 1/4deg) on Gaea-C5. I turned off all history and restart-writing except for CMEPS. To do this for OCN and ICE, I manually over-rode the write-restart logicals in the codes and set them false prior to compiling. I removed the WGC for the ATM and used a layout of 16x24 and did not use threading for the ATM. This gave me 2304 PEs as a max for CMEPS. I made a series of 24 hour runs, with mediator restarts at 3 hour intervals, giving a total of 8 mediator restart writes. I recorded the min/max and mean times for the med_phase_restart_write
in the ESMF Profile Summary log.
Using the config variables in ufs.configure, I did 3 sets of runs using 300,600,1200 or 2300 PEs for CMEPS. I set the pio_type to pnetcdf for all runs. One set of runs allowed CMEPS to set all the PIO associated parameters, one set I manually set the numio
tasks to yield a stride=4
and a final set I set both numio
and stride
according to whether the PE count was > or < 1000 (see med_io_mod).
For the existing configuration, serial netcdf is used by default. This provides a mean write time for each CMEPS restart of ~2.4s. Using pnetcdf+PIO, best results were found using the subset rearranger at stride=4. Depending on the number of tasks, this results in each CMEPS restart time between ~0.8 and 0.5s for each restart write. See full results here
Denise, thanks for testing the new parallel writing in CMEPS, the speedup is great (>60%). It might be good to test the feature in higher resolution runs (C768 and C1152). I recall we have problems to use a large number of tasks for CMEPS.
@junwang-noaa I could test the higher ATM cases, all I need is the ATM input and the layouts to try.
@DusanJovic-NOAA do you have C768/C1112 ATM only test cases (run directories) generated from G-W?
@DusanJovic-NOAA do you have C768/C1112 ATM only test cases (run directories) generated from G-W?
I have them on wcoss2 here:
/lfs/h2/emc/eib/noscrub/dusan.jovic/ufs/c1152_gw_case/ /lfs/h2/emc/eib/noscrub/dusan.jovic/ufs/c768_gw_case/
I've grabbed these now and will set up some more testing for CMEPS PIO options. It looks like in these were used to test blocksize changes. I'm assuming I should stick w/ the blocksize=32 settings, right?
I've grabbed these now and will set up some more testing for CMEPS PIO options. It looks like in these were used to test blocksize changes. I'm assuming I should stick w/ the blocksize=32 settings, right?
Yes.
Nothing is moving on Gaea today, but I've been testing adding the config variables to the RT templates. On hercules, it appears that for small PE counts, like in the cpld_control test (CMEPS=144 PEs), using serial netcdf is actually faster than pnetcdf. So I plan on doing some more tests on Gaea at the C384 resolution also using fewer and fewer CMEPS PEs, to see if I can identify the point at which pnetcdf starts to pay off.
I've been able to get the c768 ATM only case running on Gaea but it is failing at about hour 21. See /gpfs/f5/nggps_emc/scratch/Denise.Worthen/cmepspio768/test.atmonly
I'm not sure why it's failing. I compiled on gaea and used the job-card from the low-res RT case, modifying for the task count. All the fix files are pointing to G-W fix file locations on Gaea. I'm seeing
1303: forrtl: error (78): process killed (SIGTERM)
1303: Image PC Routine Line Source
1303: libpthread-2.31.s 00007F842F290910 Unknown Unknown Unknown
1303: libpthread-2.31.s 00007F842F28B70C pthread_cond_wait Unknown Unknown
1303: fv3.exe 0000000000C8A9B4 Unknown Unknown Unknown
1303: fv3.exe 0000000000C8BC29 Unknown Unknown Unknown
1303: fv3.exe 0000000000F70450 Unknown Unknown Unknown
1303: fv3.exe 00000000009FEEFE Unknown Unknown Unknown
1303: fv3.exe 000000000071E971 Unknown Unknown Unknown
1303: fv3.exe 0000000001AC3A12 fv3atm_cap_mod_mp 1077 fv3_cap.F90
1303: fv3.exe 0000000001AC346B fv3atm_cap_mod_mp 1026 fv3_cap.F90
1303: fv3.exe 0000000000CF36A8 Unknown Unknown Unknown
EDIT: Now I see that it was a time-out.
@DeniseWorthen Can you confirm that the c768 ATM test still failed on gaea? Can you list the changes to turn on PIO_Pnetcdf in CMPES so that it can be tested on wcoss2?
@junwang-noaa I haven't tried the c768 case recently. What I really need is a canned case for the coupled model that runs on Gaea---I was trying to modify the standalone case.
To turn on PnetCDF for CMEPS, add to the ufs.configure
in the MED_attributes
.
MED_attributes::
....
pio_rearranger = subset
pio_typename = pnetcdf
pio_stride = 4
....
This will create as many io tasks as possible, assuming they are laid out at a stride of 4 across the available processors.
Description
Currently CMEPS in UFS does not make use of PIO options. Restart (and history) writing is through serial netcdf. CMEPS has an existing capability to write using PIO+pnetcdf, with control of the various PIO options (eg. stride, numiotasks) through configuration.
Solution
Parallel writes for CMEPS should be implemented in UFS through setting the appropriate PIO config options. Scalability testing should be done to determine correct values for the PIO settings.
Alternatives
Related to
See https://github.com/oceanmodeling/CMEPS/issues/1 for an example of this issue arising in the coastal modeling effort.