Closed janmandel closed 1 year ago
PnetCDF to write the file for UtahEFD/QES-Winds#2 to read. Documentation at https://parallel-netcdf.github.io Same building concepts as NetCDF f90 API (more current), man pages have some explanations Fortran examples at http://cucis.ece.northwestern.edu/projects/PnetCDF/#InteroperabilityWithNetCDF4 Copied to https://github.com/janmandel/pnetcdf-tests Example write https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/tutorial/pnetcdf-write-bufferedf.f90
CHPC has installed pnetcdf/1.11.2 module load pnetcdf sets PNETCDF_INCDIR and PNETCDF_LIBDIR apparently built with intel18 Cheyenne: module load pnetcdf also sets PNETCDF and reloads on module swap intel gnu https://cug.org/proceedings/cug2016_proceedings/includes/files/pap123s2-file1.pdf build with pnetcdf /glade/work/jmandel/WRF-SFIRE-pnetcdf run with pnetcdf /glade/p/univ/ucud0004/jmandel/em_rxcadre_pnetcdf
On CHPC, getting nf90mpi_open:wrf.nc error -128 NetCDF: Attempt to use feature that was not turned on when netCDF was built with It seems to be something about wrfout created by WRF only. With wrf.nc -> testfile.nc from https://github.com/janmandel/pnetcdf-tests this does not happen. Maybe some version issue, like conda-forge/libnetcdf-feedstock#42 Unidata/netcdf4-python/issues/713
tools/nc4-test.exe can write netcdf-4 compressed files and sets NC_NETCDF4 in nc_create. Its output used as wrf.nc -> nc4_test.nc gives the same error as the wrfout. Maybe pnetcdf on chpc was compiled without compression? See wrf-model/WRF#583 for more on the compression and classic issue.
nf-config shows what was compiled into the NetCDF library
Finally, on CHPC:
$ ncdump -k nc4_test.nc
netCDF-4
(created by WRF, cannot be opened by pnetcdf)
$ ncdump -k testfile.nc
cdf5
(created by pnetcdf tests, can be opened by pnetcdf)
/uufs/chpc.utah.edu/sys/installdir/pnetcdf/1.11.2i18
/uufs/chpc.utah.edu/sys/installdir/netcdf-c/4.4.1.1i18-c7
/uufs/chpc.utah.edu/sys/installdir/netcdf-f/4.4.4i18-c7
Here is why:
$ pnetcdf-config --netcdf4
disabled
Fix: compile with setenv NETCDF_classic 1
From https://www.unidata.ucar.edu/software/netcdf/docs/parallel_io.html _NetCDF-4 provides parallel file access to both classic and netCDF-4/HDF5 files. The parallel I/O to netCDF-4 files is achieved through the HDF5 library while the parallel I/O to classic files is through PnetCDF. A few functions have been added to the netCDF C API to handle parallel I/O. You must build netCDF-4 properly to take advantage of parallel features (see Building with Parallel I/O Support). The nc_open_par() and nc_createpar() functions are used to create/open a netCDF file with parallel access.
From https://en.wikipedia.org/wiki/NetCDF#Parallel-NetCDF An extension of netCDF for parallel computing called Parallel-NetCDF (or PnetCDF) has been developed by Argonne National Laboratory and Northwestern University.[25] This is built upon MPI-IO, the I/O extension to MPI communications. Using the high-level netCDF data structures, the Parallel-NetCDF libraries can make use of optimizations to efficiently distribute the file read and write applications between multiple processors. The Parallel-NetCDF package can read/write only classic and 64-bit offset formats. Parallel-NetCDF cannot read or write the HDF5-based format available with netCDF-4.0. The Parallel-NetCDF package uses different, but similar APIs in Fortran and Parallel I/O in the Unidata netCDF library has been supported since release 4.0, for HDF5 data files. Since version 4.1.1 the Unidata NetCDF C library supports parallel I/O to classic and 64-bit offset files using the Parallel-NetCDF library, but with the NetCDF API.
From https://parallel-netcdf.github.io _NetCDF started to support parallel I/O from version 4, whose parallel I/O feature was at first built on top of parallel HDF5. Thus, the file format required by NetCDF-4 parallel I/O operations was restricted to HDF5 format. Starting from the release of 4.1, NetCDF has also included a dispatcher that enables parallel I/O operations on files in classic formats (CDF-1 and 2) through PnetCDF. Official support for the CDF-5 format started in the release of NetCDF 4.4.0. Note NetCDF now can be built with PnetCDF as its sole parallel I/O mechanism by using command-line option "--disable-netcdf-4 --enable-pnetcdf". Certainly, NetCDF can also be built with both PnetCDF and Parallel HDF5 enabled. In this case, a NetCDF program can choose either PnetCDF or Parallel HDF5 to carry out the parallel I/O by adding NC_MPIIO or NC_NETCDF4 respectively to the file open/create flag argument when calling API nc_create_par or nc_openpar. When using PnetCDF underneath, the files must be in the classic formats (CDF-1/2/5). Similarly for HDF5, the files must be in the HDF5 format (aka NetCDF-4 format). A few NetCDF-4 example programs are available that shows parallel I/O operations through PnetCDF and HDF5.
See also Parallel I/O and Portable Data Formats: PnetCDF and NetCDF4
From https://stackoverflow.com/questions/59506059/parallel-read-write-of-netcdf-file-using-fortran-and-mpi: The pnetcdf package (sometimes called parallel-netcdf) is an independent library, a totally separate implementation of netCDF ... high-performance parallel I/O library for accessing Unidata's NetCDF, files in classic formats ... pnetcdf has a netCDF-like API, but the function names are different. If pnetcdf is used in stand-alone mode (i.e. without the Unidata netCDF libraries) then user code must be written in the pnetcdf API. This code will not run using the Unidata netCDF library, it will run with pnetcdf only.... Also, pnetcdf can only be used with netCDF classic formats. It cannot read/write HDF5 files.
Using Unidata's netcdf-c/netcdf-fortran libraries would be simplest. Build pnetcdf, HDF5, then netcdf-c, then netcdf-fortran, all with MPI compilers. Make sure you specify --enable-parallel when building HDF5. (Not necessary with netcdf-c, netcdf-fortran, they will automatically detect parallel features of the HDF5 build).
Once built, the netcdf C and Fortran APIs can do parallel I/O on any netCDF file. (And also on almost all HDF5 files.) Use nc_open_par()/nc_create_par() to get parallel I/O.
Use of pnetcdf may be simplest and give best performance for classic format files. It has a slightly different API and will not work for HDF5 files.
From http://manpages.ubuntu.com/manpages/xenial/man3/pnetcdf_f90.3.html
function nf90mpi_put_var(ncid, varid, values, start, stride, imap)
integer, intent(in) :: ncid, varid
<<whatever>>, intent(in) :: values
integer, dimension(:), optional, intent(in) :: start
integer, dimension(:), optional, intent(in) :: stride
integer, dimension(:), optional, intent(in) :: imap
integer :: nf90mpi_put_var
Writes a value or values to a netCDF variable. The netCDF dataset must be open and
in data mode. values contains the value(s) what will be written to the netCDF
variable identified by ncid and varid; it may be a scalar or an array and must be
of type character, integer(kind=OneByteInt), integer(kind=TwoByteInt),
integer(kind=FourByteInt), integer(kind=EightByteInt), real(kind=FourByteReal), or
real(kind=EightByteReal). All values are converted to the external type of the
netCDF variable, if possible; otherwise, an nf90_erange error is returned. The
optional argument start specifies the starting index in the netCDF variable for
writing for each dimension of the netCDF variable. The optional argument stride
specifies the sampling stride (the interval between accessed values in the netCDF
variable) for each dimension of the netCDF variable (see COMMON ARGUMENT
DESCRIPTIONS below). The optional argument imap specifies the in-memory
arrangement of the data values (see COMMON ARGUMENT DESCRIPTIONS below).
integer(kind=MPI_OFFSET) start
specifies the starting point for accessing a netCDF variable's data values in terms
of the indicial coordinates of the corner of the array section. The indices start
at 1; thus, the first data value of a variable is (1, 1, ..., 1). The size of the
vector shall be at least the rank of the associated netCDF variable and its
elements shall correspond, in order, to the variable's dimensions.
integer(kind=MPI_OFFSET) stride
specifies the sampling interval along each dimension of the netCDF variable. The
elements of the stride vector correspond, in order, to the netCDF variable's
dimensions (stride(1)) gives the sampling interval along the most rapidly varying
dimension of the netCDF variable). Sampling intervals are specified in type-
independent units of elements (a value of 1 selects consecutive elements of the
netCDF variable along the corresponding dimension, a value of 2 selects every other
element, etc.).
integer(kind=MPI_OFFSET) imap
specifies the mapping between the dimensions of a netCDF variable and the in-memory
structure of the internal data array. The elements of the index mapping vector
correspond, in order, to the netCDF variable's dimensions (imap gives the distance
between elements of the internal array corresponding to the most rapidly varying
dimension of the netCDF variable). Distances between elements are specified in
type-independent units of elements (the distance between internal elements that
occupy adjacent memory locations is 1 and not the element's byte-length as in
netCDF 2).
Ready for testing.
git checkout develop-18
module load pnetcdf
setenv PNETCDF /uufs/chpc.utah.edu/sys/installdir/pnetcdf/1.11.2i18
setenv NETCDF_classic 1
./configure -d # select option 15 and 1
vi configure.wrf # add to INCLUDE_MODULES the line -I$(PNETCDFPATH)/include
run ./real.exe and ./wrf.exe (that will produce an expected error) ln -s wrfout_file_created wrf.nc run wrf.exe using qsub simulation file Then it rewrites U V PH in frame 1 of wrf.nc in every timestep Test build is in /uufs/chpc.utah.edu/common/home/kochanski-group4/jmandel/WRF-SFIRE-pnetcdf Test run is in test/em_fire/hill
On cheyenne (from Angel for reference):
module unload gdal module load netcdf module load pnetcdf setenv NETCDF_classic 1
or in sh: export NETCDF_classic=1
Vertically staggered variables are "half level", the top is decreased by one. We need to make function get_chsum and its call for fmw compatible with that. See 75c25c76f8365d0c2ee7fc484d4cd71616a768d4
The pnetcdf coupling with pnetcdf and checksums works on cheyenne with wrf-fire-matlab commit 3a5ef8c and WRF-SFIRE commit f8bbb82 Parallel wrf.exe and serial femwind_wrfout.exe synchronize and send the arrays back and forth. They need to be started at about the same time in the same directory with a copy of an earlier wrfout renamed wrf.nc. Getting the numbers right will take more work. To build wrf, see openwfm/WRF-SFIRE#18 branch develop-18 and load module pnetcdf first To build femwind_wrfout.exe, see openwfm/wrf-fire-matlab#4 and make in femwind/fortran.
Built WRF-SFIRE on chpc:
module load geotiff/1.4.0 module load intel/2018.1.163 module load hdf5/1.8.19 module load netcdf-c/4.4.1.1 module load impi/2018.1.163 module load netcdf-f/4.4.4 module load pnetcdf setenv PNETCDF /uufs/chpc.utah.edu/sys/installdir/pnetcdf/1.11.2i18 setenv NETCDF_classic 1 setenv NETCDF /uufs/chpc.utah.edu/sys/installdir/netcdf-f/4.4.4i18-c7
./configure -d # select 15 and then 1 vi configure.wrf # add to INCLUDE_MODULES the line -I$(PNETCDFPATH)/include ./compile em_fire
Here are notes for building QES on chpc: Right now, the branch to use is pw-wrfqes-wbmerge (87a66c895f454316216bea7436e1e32a320a4208)
I have more testing to complete before I merge back to the main dev branches.
Modules: module load cuda/10.2 module load gcc/8.1.0 module load cmake/3.15.3 module load gdal/3.0.1 module load boost/1.69.0
You need to be in the QES-Winds main folder, and then you can build from the build/ folder: rm -rf build; mkdir build; cd build cmake -DCUDA_TOOLKIT_DIR=/uufs/chpc.utah.edu/sys/installdir/cuda/10.2.89 -DCUDA_SDK_ROOT_DIR=/uufs/chpc.utah.edu/sys/installdir/cuda/10.2.89 -DNETCDF_DIR=/uufs/chpc.utah.edu/sys/installdir/netcdf-c/4.4.1-c7/include -DNETCDF_CXX_DIR=/uufs/chpc.utah.edu/sys/installdir/netcdf-cxx/4.3.0-5.4.0g/include -DOptiX_INSTALL_DIR=/uufs/chpc.utah.edu/sys/installdir/optix/7.1.0/ -DCMAKE_C_COMPILER=gcc -DCMAKE_PREFIX_PATH="/uufs/chpc.utah.edu/sys/installdir/gdal/3.0.1;/uufs/chpc.utah.edu/sys/installdir/hdf5/1.8.17-c7" ..
then build it: make make
The communication file is coded as
/uufs/chpc.utah.edu/common/home/kochanski-group4/jmandel/QES-Winds/run 0a2b8b5
/uufs/chpc.utah.edu/common/home/kochanski-group4/jmandel/WRF-SFIRE-develop-18/test/em_fire/hill 938b124e12368cca8e6634b31661b327b225ca59
Got the ping-pong working again. Code to do this is in the pw-wrfqes-wbmerge branch of qesWinds
This should test the ping-pong'ing a bit. I need to fix some initialization issues in qes for setting up the correct time series and we should be in better shape.
@willemsn
I am trying the recipe above from Jan 14 but cmake first did not like the -
at the end of line and then can't find pnetcdf. module load pnetcdf not found, load module load pnetcdf/1.11.2 requires intel and impi which disable gcc
Hi Jan,
QES does not use PNETCDF; our build will not need PNETCDF. For WRF-SFIRE though, I build it with this module list:
module load geotiff/1.4.0 module load intel/2018.1.163 module load hdf5/1.8.19 module load netcdf-c/4.4.1.1 module load impi/2018.1.163 module load netcdf-f/4.4.4 module load pnetcdf setenv PNETCDF /uufs/chpc.utah.edu/sys/installdir/pnetcdf/1.11.2i18 setenv NETCDF_classic 1 setenv NETCDF /uufs/chpc.utah.edu/sys/installdir/netcdf-f/4.4.4i18-c7
With QES, I use this module list:
module load cuda/10.2 module load gcc/8.1.0 module load cmake/3.15.3 module load gdal/3.0.1 module load boost/1.69.0
Works fine on CHPC with the module parallel-netcdf which points a build of pnetcdf with the Intel oneapi compiler.
I've got it compiling and running. Doesn't seem to be "syncing" and waiting for QES. Is there something that need to be turned on that I forgot about?
On CHPC use:
module purge
module load intel-oneapi-compilers/2021.4.0 openmpi/4.1.1
module load netcdf-c/4.8.1 netcdf-fortran/4.5.3
module load parallel-netcdf/1.12.2
setenv PNETCDF $PARALLEL_NETCDF_ROOT
setenv NETCDF_classic 1
and setenv NETCDF
to a directory with subdirectories include
and lib
populated by soft links to files from both $NETCDF_C_ROOT
and $NETCDF_FORTRAN_ROOT
in the same subdirectories.
This can be done by
source /uufs/chpc.utah.edu/common/home/u6015690/lib/intel-2021.4.0.tcsh
before the ./compile
and in the slurm script.
Note: parallel-netcdf/1.12.2
seems to be bound to openmpi/4.1.1
built using the intel-oneapi-compilers/2021.4.0 compiler. Other combinations of dependencies listed by module spider parallel-netcdf/1.12.2
may link but will crash at runtime.
Also, parallel-netcdf will crash when opening a flle that is a soft link to a file in /scratch/general/lustre
. The code works fine when everything is done on lustre.
Continued in #5
For the PREEVENTS project. Comments on WRF-SFIRE edits here, for the other side at UtahEFD/QES-Winds#2.