openwfm / WRF-SFIRE

A coupled weather-fire forecasting model built on top of Weather Research and Forecasting (WRF). This is the original https://github.com/openwfm/wrf-fire transitioned to a fork of WRF and selected as ifire=1. Graphic log at https://repo.or.cz/git-browser/by-commit.html?r=WRF-SFIRE.git
https://wiki.openwfm.org
Other
42 stars 12 forks source link

Use PNETCDF to couple WRF-SFIRE with mass-consistent solver #18

Closed janmandel closed 1 year ago

janmandel commented 4 years ago

For the PREEVENTS project. Comments on WRF-SFIRE edits here, for the other side at UtahEFD/QES-Winds#2.

janmandel commented 4 years ago

PnetCDF to write the file for UtahEFD/QES-Winds#2 to read. Documentation at https://parallel-netcdf.github.io Same building concepts as NetCDF f90 API (more current), man pages have some explanations Fortran examples at http://cucis.ece.northwestern.edu/projects/PnetCDF/#InteroperabilityWithNetCDF4 Copied to https://github.com/janmandel/pnetcdf-tests Example write https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/tutorial/pnetcdf-write-bufferedf.f90

janmandel commented 4 years ago

CHPC has installed pnetcdf/1.11.2 module load pnetcdf sets PNETCDF_INCDIR and PNETCDF_LIBDIR apparently built with intel18 Cheyenne: module load pnetcdf also sets PNETCDF and reloads on module swap intel gnu https://cug.org/proceedings/cug2016_proceedings/includes/files/pap123s2-file1.pdf build with pnetcdf /glade/work/jmandel/WRF-SFIRE-pnetcdf run with pnetcdf /glade/p/univ/ucud0004/jmandel/em_rxcadre_pnetcdf

janmandel commented 4 years ago

On CHPC, getting nf90mpi_open:wrf.nc error -128 NetCDF: Attempt to use feature that was not turned on when netCDF was built with It seems to be something about wrfout created by WRF only. With wrf.nc -> testfile.nc from https://github.com/janmandel/pnetcdf-tests this does not happen. Maybe some version issue, like conda-forge/libnetcdf-feedstock#42 Unidata/netcdf4-python/issues/713
tools/nc4-test.exe can write netcdf-4 compressed files and sets NC_NETCDF4 in nc_create. Its output used as wrf.nc -> nc4_test.nc gives the same error as the wrfout. Maybe pnetcdf on chpc was compiled without compression? See wrf-model/WRF#583 for more on the compression and classic issue. nf-config shows what was compiled into the NetCDF library Finally, on CHPC: $ ncdump -k nc4_test.nc netCDF-4 (created by WRF, cannot be opened by pnetcdf) $ ncdump -k testfile.nc cdf5 (created by pnetcdf tests, can be opened by pnetcdf) /uufs/chpc.utah.edu/sys/installdir/pnetcdf/1.11.2i18 /uufs/chpc.utah.edu/sys/installdir/netcdf-c/4.4.1.1i18-c7 /uufs/chpc.utah.edu/sys/installdir/netcdf-f/4.4.4i18-c7 Here is why: $ pnetcdf-config --netcdf4 disabled Fix: compile with setenv NETCDF_classic 1

janmandel commented 4 years ago

From https://www.unidata.ucar.edu/software/netcdf/docs/parallel_io.html _NetCDF-4 provides parallel file access to both classic and netCDF-4/HDF5 files. The parallel I/O to netCDF-4 files is achieved through the HDF5 library while the parallel I/O to classic files is through PnetCDF. A few functions have been added to the netCDF C API to handle parallel I/O. You must build netCDF-4 properly to take advantage of parallel features (see Building with Parallel I/O Support). The nc_open_par() and nc_createpar() functions are used to create/open a netCDF file with parallel access.

From https://en.wikipedia.org/wiki/NetCDF#Parallel-NetCDF An extension of netCDF for parallel computing called Parallel-NetCDF (or PnetCDF) has been developed by Argonne National Laboratory and Northwestern University.[25] This is built upon MPI-IO, the I/O extension to MPI communications. Using the high-level netCDF data structures, the Parallel-NetCDF libraries can make use of optimizations to efficiently distribute the file read and write applications between multiple processors. The Parallel-NetCDF package can read/write only classic and 64-bit offset formats. Parallel-NetCDF cannot read or write the HDF5-based format available with netCDF-4.0. The Parallel-NetCDF package uses different, but similar APIs in Fortran and Parallel I/O in the Unidata netCDF library has been supported since release 4.0, for HDF5 data files. Since version 4.1.1 the Unidata NetCDF C library supports parallel I/O to classic and 64-bit offset files using the Parallel-NetCDF library, but with the NetCDF API.

From https://parallel-netcdf.github.io _NetCDF started to support parallel I/O from version 4, whose parallel I/O feature was at first built on top of parallel HDF5. Thus, the file format required by NetCDF-4 parallel I/O operations was restricted to HDF5 format. Starting from the release of 4.1, NetCDF has also included a dispatcher that enables parallel I/O operations on files in classic formats (CDF-1 and 2) through PnetCDF. Official support for the CDF-5 format started in the release of NetCDF 4.4.0. Note NetCDF now can be built with PnetCDF as its sole parallel I/O mechanism by using command-line option "--disable-netcdf-4 --enable-pnetcdf". Certainly, NetCDF can also be built with both PnetCDF and Parallel HDF5 enabled. In this case, a NetCDF program can choose either PnetCDF or Parallel HDF5 to carry out the parallel I/O by adding NC_MPIIO or NC_NETCDF4 respectively to the file open/create flag argument when calling API nc_create_par or nc_openpar. When using PnetCDF underneath, the files must be in the classic formats (CDF-1/2/5). Similarly for HDF5, the files must be in the HDF5 format (aka NetCDF-4 format). A few NetCDF-4 example programs are available that shows parallel I/O operations through PnetCDF and HDF5.

See also Parallel I/O and Portable Data Formats: PnetCDF and NetCDF4

janmandel commented 4 years ago

From https://stackoverflow.com/questions/59506059/parallel-read-write-of-netcdf-file-using-fortran-and-mpi: The pnetcdf package (sometimes called parallel-netcdf) is an independent library, a totally separate implementation of netCDF ... high-performance parallel I/O library for accessing Unidata's NetCDF, files in classic formats ... pnetcdf has a netCDF-like API, but the function names are different. If pnetcdf is used in stand-alone mode (i.e. without the Unidata netCDF libraries) then user code must be written in the pnetcdf API. This code will not run using the Unidata netCDF library, it will run with pnetcdf only.... Also, pnetcdf can only be used with netCDF classic formats. It cannot read/write HDF5 files.

I/O on few processors (<10)

Using Unidata's netcdf-c/netcdf-fortran libraries would be simplest. Build pnetcdf, HDF5, then netcdf-c, then netcdf-fortran, all with MPI compilers. Make sure you specify --enable-parallel when building HDF5. (Not necessary with netcdf-c, netcdf-fortran, they will automatically detect parallel features of the HDF5 build).

Once built, the netcdf C and Fortran APIs can do parallel I/O on any netCDF file. (And also on almost all HDF5 files.) Use nc_open_par()/nc_create_par() to get parallel I/O.

I/O on some processors (10 - 1000)

Use of pnetcdf may be simplest and give best performance for classic format files. It has a slightly different API and will not work for HDF5 files.

janmandel commented 4 years ago

From http://manpages.ubuntu.com/manpages/xenial/man3/pnetcdf_f90.3.html

function nf90mpi_put_var(ncid, varid, values, start, stride, imap)
          integer, intent(in) :: ncid, varid
          <<whatever>>, intent(in) :: values
          integer, dimension(:), optional, intent(in) :: start
          integer, dimension(:), optional, intent(in) ::  stride
          integer, dimension(:), optional, intent(in) ::  imap
          integer :: nf90mpi_put_var

          Writes a value or values to a netCDF variable.  The netCDF dataset must be open and
          in  data  mode.   values  contains  the value(s) what will be written to the netCDF
          variable identified by ncid and varid; it may be a scalar or an array and  must  be
          of     type    character,    integer(kind=OneByteInt),    integer(kind=TwoByteInt),
          integer(kind=FourByteInt), integer(kind=EightByteInt), real(kind=FourByteReal),  or
          real(kind=EightByteReal).   All  values  are  converted to the external type of the
          netCDF variable, if possible; otherwise, an nf90_erange  error  is  returned.   The
          optional  argument  start  specifies  the starting index in the netCDF variable for
          writing for each dimension of the netCDF variable.  The  optional  argument  stride
          specifies  the  sampling stride (the interval between accessed values in the netCDF
          variable)  for  each  dimension  of  the  netCDF  variable  (see  COMMON   ARGUMENT
          DESCRIPTIONS   below).    The   optional  argument  imap  specifies  the  in-memory
          arrangement of the data values (see COMMON ARGUMENT DESCRIPTIONS below).

   integer(kind=MPI_OFFSET) start
          specifies the starting point for accessing a netCDF variable's data values in terms
          of  the indicial coordinates of the corner of the array section.  The indices start
          at 1; thus, the first data value of a variable is (1, 1, ..., 1).  The size of  the
          vector  shall  be  at  least  the  rank  of  the associated netCDF variable and its
          elements shall correspond, in order, to the variable's dimensions.

   integer(kind=MPI_OFFSET) stride
          specifies the sampling interval along each dimension of the netCDF variable.    The
          elements  of  the  stride  vector  correspond,  in  order, to the netCDF variable's
          dimensions (stride(1)) gives the sampling interval along the most  rapidly  varying
          dimension  of  the  netCDF  variable).   Sampling  intervals are specified in type-
          independent units of elements (a value of 1 selects  consecutive  elements  of  the
          netCDF variable along the corresponding dimension, a value of 2 selects every other
          element, etc.).

   integer(kind=MPI_OFFSET) imap
          specifies the mapping between the dimensions of a netCDF variable and the in-memory
          structure  of  the  internal  data array.  The elements of the index mapping vector
          correspond, in order, to the netCDF variable's dimensions (imap gives the  distance
          between  elements  of  the internal array corresponding to the most rapidly varying
          dimension of the netCDF variable).  Distances between  elements  are  specified  in
          type-independent  units  of  elements  (the distance between internal elements that
          occupy adjacent memory locations is 1 and  not  the  element's  byte-length  as  in
          netCDF 2).
janmandel commented 4 years ago

Ready for testing.

To build

git checkout develop-18 module load pnetcdf setenv PNETCDF /uufs/chpc.utah.edu/sys/installdir/pnetcdf/1.11.2i18 setenv NETCDF_classic 1 ./configure -d # select option 15 and 1 vi configure.wrf # add to INCLUDE_MODULES the line -I$(PNETCDFPATH)/include

To run

run ./real.exe and ./wrf.exe (that will produce an expected error) ln -s wrfout_file_created wrf.nc run wrf.exe using qsub simulation file Then it rewrites U V PH in frame 1 of wrf.nc in every timestep Test build is in /uufs/chpc.utah.edu/common/home/kochanski-group4/jmandel/WRF-SFIRE-pnetcdf Test run is in test/em_fire/hill

Testing needed

janmandel commented 3 years ago

On cheyenne (from Angel for reference):

module unload gdal module load netcdf module load pnetcdf setenv NETCDF_classic 1

or in sh: export NETCDF_classic=1

janmandel commented 3 years ago

Vertically staggered variables are "half level", the top is decreased by one. We need to make function get_chsum and its call for fmw compatible with that. See 75c25c76f8365d0c2ee7fc484d4cd71616a768d4

janmandel commented 3 years ago

The pnetcdf coupling with pnetcdf and checksums works on cheyenne with wrf-fire-matlab commit 3a5ef8c and WRF-SFIRE commit f8bbb82 Parallel wrf.exe and serial femwind_wrfout.exe synchronize and send the arrays back and forth. They need to be started at about the same time in the same directory with a copy of an earlier wrfout renamed wrf.nc. Getting the numbers right will take more work. To build wrf, see openwfm/WRF-SFIRE#18 branch develop-18 and load module pnetcdf first To build femwind_wrfout.exe, see openwfm/wrf-fire-matlab#4 and make in femwind/fortran.

willemsn commented 3 years ago

Built WRF-SFIRE on chpc:

module load geotiff/1.4.0 module load intel/2018.1.163 module load hdf5/1.8.19 module load netcdf-c/4.4.1.1 module load impi/2018.1.163 module load netcdf-f/4.4.4 module load pnetcdf setenv PNETCDF /uufs/chpc.utah.edu/sys/installdir/pnetcdf/1.11.2i18 setenv NETCDF_classic 1 setenv NETCDF /uufs/chpc.utah.edu/sys/installdir/netcdf-f/4.4.4i18-c7

./configure -d # select 15 and then 1 vi configure.wrf # add to INCLUDE_MODULES the line -I$(PNETCDFPATH)/include ./compile em_fire

willemsn commented 2 years ago

Here are notes for building QES on chpc: Right now, the branch to use is pw-wrfqes-wbmerge (87a66c895f454316216bea7436e1e32a320a4208)

I have more testing to complete before I merge back to the main dev branches.

Modules: module load cuda/10.2 module load gcc/8.1.0 module load cmake/3.15.3 module load gdal/3.0.1 module load boost/1.69.0

You need to be in the QES-Winds main folder, and then you can build from the build/ folder: rm -rf build; mkdir build; cd build cmake -DCUDA_TOOLKIT_DIR=/uufs/chpc.utah.edu/sys/installdir/cuda/10.2.89 -DCUDA_SDK_ROOT_DIR=/uufs/chpc.utah.edu/sys/installdir/cuda/10.2.89 -DNETCDF_DIR=/uufs/chpc.utah.edu/sys/installdir/netcdf-c/4.4.1-c7/include -DNETCDF_CXX_DIR=/uufs/chpc.utah.edu/sys/installdir/netcdf-cxx/4.3.0-5.4.0g/include -DOptiX_INSTALL_DIR=/uufs/chpc.utah.edu/sys/installdir/optix/7.1.0/ -DCMAKE_C_COMPILER=gcc -DCMAKE_PREFIX_PATH="/uufs/chpc.utah.edu/sys/installdir/gdal/3.0.1;/uufs/chpc.utah.edu/sys/installdir/hdf5/1.8.17-c7" ..

then build it: make make

janmandel commented 2 years ago

The communication file is coded as path in QES-Winds/data/InputFiles/WRFInterpTest.xml

janmandel commented 2 years ago

/uufs/chpc.utah.edu/common/home/kochanski-group4/jmandel/QES-Winds/run 0a2b8b5

/uufs/chpc.utah.edu/common/home/kochanski-group4/jmandel/WRF-SFIRE-develop-18/test/em_fire/hill 938b124e12368cca8e6634b31661b327b225ca59

willemsn commented 2 years ago

Got the ping-pong working again. Code to do this is in the pw-wrfqes-wbmerge branch of qesWinds

This should test the ping-pong'ing a bit. I need to fix some initialization issues in qes for setting up the correct time series and we should be in better shape.

janmandel commented 2 years ago

@willemsn I am trying the recipe above from Jan 14 but cmake first did not like the - at the end of line and then can't find pnetcdf. module load pnetcdf not found, load module load pnetcdf/1.11.2 requires intel and impi which disable gcc

willemsn commented 2 years ago

Hi Jan,

QES does not use PNETCDF; our build will not need PNETCDF. For WRF-SFIRE though, I build it with this module list:

!/bin/sh

module load geotiff/1.4.0 module load intel/2018.1.163 module load hdf5/1.8.19 module load netcdf-c/4.4.1.1 module load impi/2018.1.163 module load netcdf-f/4.4.4 module load pnetcdf setenv PNETCDF /uufs/chpc.utah.edu/sys/installdir/pnetcdf/1.11.2i18 setenv NETCDF_classic 1 setenv NETCDF /uufs/chpc.utah.edu/sys/installdir/netcdf-f/4.4.4i18-c7

With QES, I use this module list:

!/bin/sh

module load cuda/10.2 module load gcc/8.1.0 module load cmake/3.15.3 module load gdal/3.0.1 module load boost/1.69.0

janmandel commented 2 years ago

Works fine on CHPC with the module parallel-netcdf which points a build of pnetcdf with the Intel oneapi compiler.

willemsn commented 2 years ago

I've got it compiling and running. Doesn't seem to be "syncing" and waiting for QES. Is there something that need to be turned on that I forgot about?

janmandel commented 2 years ago

On CHPC use:

 module purge
 module load intel-oneapi-compilers/2021.4.0 openmpi/4.1.1
 module load netcdf-c/4.8.1  netcdf-fortran/4.5.3
 module load parallel-netcdf/1.12.2
 setenv PNETCDF $PARALLEL_NETCDF_ROOT
 setenv NETCDF_classic 1

and setenv NETCDF to a directory with subdirectories include and lib populated by soft links to files from both $NETCDF_C_ROOT and $NETCDF_FORTRAN_ROOT in the same subdirectories.

This can be done by

 source  /uufs/chpc.utah.edu/common/home/u6015690/lib/intel-2021.4.0.tcsh

before the ./compile and in the slurm script.

Note: parallel-netcdf/1.12.2 seems to be bound to openmpi/4.1.1 built using the intel-oneapi-compilers/2021.4.0 compiler. Other combinations of dependencies listed by module spider parallel-netcdf/1.12.2 may link but will crash at runtime.

Also, parallel-netcdf will crash when opening a flle that is a soft link to a file in /scratch/general/lustre. The code works fine when everything is done on lustre.

Fergui commented 1 year ago

Continued in #5