wrf-model / WRF

The official repository for the Weather Research and Forecasting (WRF) model
Other
1.18k stars 658 forks source link

Add PnetCDF Non-blocking APIs to Perform I/O #2058

Open yzanhua opened 1 month ago

yzanhua commented 1 month ago

TYPE: new feature

KEYWORDS: parallel I/O, PnetCDF, non-blocking APIs, requests aggregation

SOURCE: Zanhua Huang (Northwestern University), Wei-keng Liao (Northwestern University, @wkliao)

DESCRIPTION OF CHANGES: Problem: We found that using PnetCDF non-blocking APIs can improve the parallel I/O performance noticeably over the original blocking APIs used in WRF. A paper discussing the performance of PnetCDF non-blocking APIs with WRF is I/O in WRF: A Case Study in Modern Parallel I/O Techniques

Solution: We added a new I/O option to enable PnetCDF non-blocking APIs. If users specify enable_pnetcdf_bput to .true. in the namelist, and also specify io_form to PnetCDF for history and/or restart file (io_form = 11), then PnetCDF non-blocking APIs will be used.

When PnetCDF non-blocking APIs are enabled, the write calls to WRF variables are first buffered in the memory, and flushed to file until the end of each time step.

LIST OF MODIFIED FILES:

  1. Registry/registry.io_boilerplate
  2. external/io_pnetcdf/ext_pnc_put_dom_ti.code
  3. external/io_pnetcdf/field_routines.F90
  4. external/io_pnetcdf/wrf_io.F90
  5. frame/module_io.F
  6. share/output_wrf.F

TESTS CONDUCTED:

  1. Do mods fix problem? How can that be demonstrated, and was that test conducted? We conducted the performance evaluation on a WRF single-domain benchmark with a grid size of 1900x1300 on Cori at NERSC. The performance improvement of using PnetCDF non-blocking APIs is presented in the paper mentioned above.
  2. Are the Jenkins tests all passing? Not tested

RELEASE NOTE: Support PnetCDF non-blocking APIs to increase parallel I/O performance of PnetCDF. Zanhua Huang, Kaiyuan Hou, Ankit Agrawal, Alok Choudhary, Robert Ross, and Wei-Keng Liao. 2023. I/O in WRF: A Case Study in Modern Parallel I/O Techniques. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '23). Association for Computing Machinery, New York, NY, USA, Article 94, 1–13. https://doi.org/10.1145/3581784.3613216

weiwangncar commented 1 month ago

@yzanhua One of the compilation has failed and it is for WRFDA build. See the attached output file. output_0.gz

If you need instructions to build WRFDA, see here.

wkliao commented 1 month ago

@yzanhua

The error messages are extracted below.

/opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /wrf/WRFPLUS/main/libwrflib.a(output_wrf.o): in function `bputcalcbuffersize_':
output_wrf.f90:(.text+0x0): multiple definition of `bputcalcbuffersize_'; ./libwrfvar.a(output_wrf.o):output_wrf.f90:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status
weiwangncar commented 4 weeks ago

The regression tests have passed:

Test Type              | Expected  | Received |  Failed
= = = = = = = = = = = = = = = = = = = = = = = =  = = = =
Number of Tests        : 23           24
Number of Builds       : 60           57
Number of Simulations  : 158           150        0
Number of Comparisons  : 95           86        0

Failed Simulations are: 
None
Which comparisons are not bit-for-bit: 
None
wkliao commented 4 weeks ago

Hi, @weiwangncar

Thanks. Can you also modify the regression test to add a test with this feature enabled? This feature is optional to the users and can be enabled by setting enable_pnetcdf_bput to .true. in the namelist, and io_form = 11.

If it improved the write time significantly, as shown in our SC paper, maybe it can become a default option in the future.

weiwangncar commented 4 weeks ago

@wkliao We do not test pnetcdf io in the regression test at the moment. But we will test the code on our system. Thanks for contributing this to the community code!