ufs-community / UFS_UTILS

Utilities for the NCEP models.
Other
21 stars 108 forks source link

Update sfc_climo_gen to output fractional vegetation and soil type #709

Closed GeorgeGayno-NOAA closed 1 year ago

GeorgeGayno-NOAA commented 1 year ago

Currently, this program outputs a dominate category at each model point. Update the program to output the percentage of each category within a model point.

GeorgeGayno-NOAA commented 1 year ago

Tried some timing and memory tests on Hera using 4adcb64. I used a small (301x200) CONUS ESG grid and the VIIRS vegetation data.

So, the fractional option can take a lot of memory. We are pushing the limit using the NH version of the file. Using the global version may not be feasible on some machines.

GeorgeGayno-NOAA commented 1 year ago

Tried some timing and memory tests on Hera using 4adcb64. I used a small (301x200) CONUS ESG grid and the VIIRS vegetation data.

  • Test 1 - use the vegetation_type.viirs.igbp.conus.30s.nc as input. Using one node and 24 tasks, it ran in about 60 seconds.
  • Test 2 - use the vegetation_type.viirs.igbp.nh.30s.nc as input. I had to request the two 'bigmem' nodes, 12 tasks per node. It ran in about six minutes.

So, the fractional option can take a lot of memory. We are pushing the limit using the NH version of the file. Using the global version may not be feasible on some machines.

I repeated this test on WCOSS2 using the NH file. It was able to run on one node/24 tasks, but I had to request 500 GB of memory: #PBS -l select=1:ncpus=24:mem=500GB. It took about 6 minutes to run.

GeorgeGayno-NOAA commented 1 year ago

When processing a fractional grid (points can be a mix of land and non-land), how will the dominate category be defined? Suppose a point is:

Is the dominate category forest or water?

GeorgeGayno-NOAA commented 1 year ago

In order to read the water flag from the file, the file attributes and missing values need to be changed. This may be done using netCDF tools. On WCOSS2, the vegetation type files were changed as follows:

module load gsl
module load nco

ncatted -h -a water_category,vegetation_type,c,i,17 file.nc
ncatted -h -a missing_value,vegetation_type,o,i,17 file.nc
ncap2 -O -s 'where(vegetation_type < 0) vegetation_type=17' file.nc file.nc
GeorgeGayno-NOAA commented 1 year ago

When processing a fractional grid (points can be a mix of land and non-land), how will the dominate category be defined? Suppose a point is:

  • 40% land
  • 60% non-land
  • 35% forest
  • 5% grassland

Is the dominate category forest or water?

After speaking with @barlage we decided that the dominate category will always refer to a land category. So, in this case, it would be 'forest'.

GeorgeGayno-NOAA commented 1 year ago

Encountered a problem when using the NH soil type data. (The global and CONUS soil type data worked fine, and the NH vegetation data worked fine.) A segmentation fault was encountered during the FieldScatter. It happened in the same place on both WCOSS2 and Hera. After consulting with the ESMF team, they discovered a problem in the FieldScatter function. A fix was provided in this tag: https://github.com/esmf-org/esmf/tree/v8.5.0b09

barlage commented 1 year ago

@GeorgeGayno-NOAA based on what I see in the code, the vegetation_type tile output will contain both dominant category and fractional information; is that correct?

GeorgeGayno-NOAA commented 1 year ago

@GeorgeGayno-NOAA based on what I see in the code, the vegetation_type tile output will contain both dominant category and fractional information; is that correct?

Both the vegetation type and soil type tile output will contain a record of fractional information and a record of dominate category. Currently, there is a third 'sum' record that sums the fractional values. I use that as a diagnostic to see how close the sum is to 'one'. I will probably comment that out before merging.

GeorgeGayno-NOAA commented 1 year ago

Encountered a problem when using the NH soil type data. (The global and CONUS soil type data worked fine, and the NH vegetation data worked fine.) A segmentation fault was encountered during the FieldScatter. It happened in the same place on both WCOSS2 and Hera. After consulting with the ESMF team, they discovered a problem in the FieldScatter function. A fix was provided in this tag: https://github.com/esmf-org/esmf/tree/v8.5.0b09

The new ESMF tag was compiled on Hera: /scratch1/NCEPDEV/da/George.Gayno/noscrub/esmf.git/esmf and the branch was recompiled. A regional grid was successfully created using soil_type.statsgo.nh.30s.nc as input.

Will ask the libraries group and EPIC to install the latest ESMF on all officially supported machines.

GeorgeGayno-NOAA commented 1 year ago

When processing a fractional grid (points can be a mix of land and non-land), how will the dominate category be defined? Suppose a point is:

  • 40% land
  • 60% non-land
  • 35% forest
  • 5% grassland

Is the dominate category forest or water?

After speaking with @barlage we decided that the dominate category will always refer to a land category. So, in this case, it would be 'forest'.

Logic added at 64a9bd0. Tested at a few sample points. Will test for an entire grid next.

GeorgeGayno-NOAA commented 1 year ago

Logic added at 64a9bd0 was tested on a C96 uniform grid. Diagnostics at point (42,82) for each were printed to standard output - https://github.com/GeorgeGayno-NOAA/UFS_UTILS/blob/feature/sfc_climo_gen.frac/sorc/sfc_climo_gen.fd/interp_frac_cats.F90#L260

GeorgeGayno-NOAA commented 1 year ago

Here is the printout for tile 1-

after rescale 1 7.8082189E-02 0.0000000E+00 4.7417533E-02 0.0000000E+00 0.0000000E+00 2.2303175E-02 2.7782659E-03 2.7923910E-03 2.7908301E-03 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.9219178 0.0000000E+00 0.0000000E+00 0.0000000E+00 dominate cat 1 2.000000

This point is 7.8% land (the first bold number). The predominate vegetation type category (17) is 'water' at 92.2%. The next predominate category (2) is 'forest' at 4.74%. The new logic correctly picks '2' as the dominate category instead of '17'.

This point at tile 4 is 100% water. In this case, the dominate category is correctly selected as 'water' (17):

after rescale 4 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 1.000000 0.0000000E+00 0.0000000E+00 0.0000000E+00 dominate cat 4 17.00000

GeorgeGayno-NOAA commented 1 year ago

Encountered a problem when using the NH soil type data. (The global and CONUS soil type data worked fine, and the NH vegetation data worked fine.) A segmentation fault was encountered during the FieldScatter. It happened in the same place on both WCOSS2 and Hera. After consulting with the ESMF team, they discovered a problem in the FieldScatter function. A fix was provided in this tag: https://github.com/esmf-org/esmf/tree/v8.5.0b09

The new ESMF tag was compiled on Hera: /scratch1/NCEPDEV/da/George.Gayno/noscrub/esmf.git/esmf and the branch was recompiled. A regional grid was successfully created using soil_type.statsgo.nh.30s.nc as input.

Will ask the libraries group and EPIC to install the latest ESMF on all officially supported machines.

For now, go back to the original ESMF library (b19b2f8).

GeorgeGayno-NOAA commented 1 year ago

In order to read the water flag from the file, the file attributes and missing values need to be changed. This may be done using netCDF tools. On WCOSS2, the vegetation type files were changed as follows:

module load gsl
module load nco

ncatted -h -a water_category,vegetation_type,c,i,17 file.nc
ncatted -h -a missing_value,vegetation_type,o,i,17 file.nc
ncap2 -O -s 'where(vegetation_type < 0) vegetation_type=17' file.nc file.nc

A dump of the header shows the update (ncdump -h vegetation_type.modis.igbp.0.03.nc):

variables:
        float vegetation_type(time, jdim, idim) ;
                vegetation_type:landice_category = 15 ;
                vegetation_type:missing_value = 17 ;
                vegetation_type:water_category = 17 ;

Compared with the OPS file:

        float vegetation_type(time, jdim, idim) ;
                vegetation_type:landice_category = 15 ;
                vegetation_type:missing_value = -999.9f ;

A similar procedure was used on the soil type files (ncdump -h soil_type.statsgo.0.05.nc):

        float soil_type(time, jdim, idim) ;
                soil_type:landice_category = 16 ;
                soil_type:missing_value = 14 ;
                soil_type:water_category = 14 ;

Compared with the OPS file:

        float soil_type(time, jdim, idim) ;
                soil_type:landice_category = 16 ;
                soil_type:missing_value = -999.9f ;
GeorgeGayno-NOAA commented 1 year ago

@KateFriedman-NOAA Here is a list of files that were updated and must be stored to the official 'fix' directories. They are on Hera (/scratch1/NCEPDEV/da/George.Gayno/ufs_utils.git/UFS_UTILS/fix/sfc_climo.test)

soil_type.bnu.30s.nc soil_type.statsgo.0.03.nc soil_type.statsgo.0.05.nc soil_type.statsgo.30s.nc soil_type.statsgo.conus.30s.nc soil_type.statsgo.nh.30s.nc vegetation_type.modis.igbp.0.03.nc vegetation_type.modis.igbp.0.05.nc vegetation_type.modis.igbp.30s.nc vegetation_type.modis.igbp.conus.30s.nc vegetation_type.modis.igbp.nh.30s.nc vegetation_type.viirs.igbp.0.03.nc vegetation_type.viirs.igbp.0.05.nc vegetation_type.viirs.igbp.0.1.nc vegetation_type.viirs.igbp.30s.nc vegetation_type.viirs.igbp.conus.30s.nc vegetation_type.viirs.igbp.nh.30s.nc

This change is backwards compatible - current 'develop' will work with these files. So, you don't need to a new sub-directory under /scratch1/NCEPDEV/global/glopara/fix/sfc_climo to store them. But that is your choice.

This is not urgent.

KateFriedman-NOAA commented 1 year ago

@GeorgeGayno-NOAA Related to this, in global-workflow develop we are still setting sfc_climo_ver=20220805. We have a newer set (20221017) where other recent additions for sfc_climo have gone. Should we change sfc_climo_ver=20221017 in develop now? Are there any code or scripts updates needed before doing so? I ask because my plan was to add these new sfc_climo fix files into the newer 20221017 set. Let me know, thanks!

FYI @WalterKolczynski-NOAA

GeorgeGayno-NOAA commented 1 year ago

@GeorgeGayno-NOAA Related to this, in global-workflow develop we are still setting sfc_climo_ver=20220805. We have a newer set (20221017) where other recent additions for sfc_climo have gone. Should we change sfc_climo_ver=20221017 in develop now? Are there any code or scripts updates needed before doing so? I ask because my plan was to add these new sfc_climo fix files into the newer 20221017 set. Let me know, thanks!

FYI @WalterKolczynski-NOAA

The files in ./sfc_climo are used by UFS_UTILS to create new model grids. I don't believe the global-workflow creates grids? But if you want to be consistent, the switch to use 20221017 was made at b67f487. So if you are pointing to a newer snapshot of 'develop', I would switch to the new version.

KateFriedman-NOAA commented 1 year ago

The files in ./sfc_climo are used by UFS_UTILS to create new model grids. I don't believe the global-workflow creates grids? But if you want to be consistent, the switch to use 20221017 was made at b67f487. So if you are pointing to a newer snapshot of 'develop', I would switch to the new version.

@GeorgeGayno-NOAA Noted and thanks for that hash. The global-workflow is on a slightly older hash at the moment (https://github.com/ufs-community/UFS_UTILS/commit/8b990c060af2e13f0f5b2ea5b54aa2a7686333c1) so when we move develop to a hash at or later than https://github.com/ufs-community/UFS_UTILS/commit/b67f487ba94cf534aa8788e2df1bd55a5bc1388c we'll change to sfc_climo_ver=20221017.

I'll proceed with adding these new fix files into the 20221017 sfc_climo folder for you now.

KateFriedman-NOAA commented 1 year ago

Updated sfc_climo fix files have been copied into the 20221017 fix subfolder. Have rsync'd the files to the fix sets on Hera, Orion, Jet, and both WCOSS2s.

Showing rsync to Orion set as example to confirm updates are in:

Orion-login-2[5] /work/noaa/global/glopara$ rsync -azv --delete-before Kate.Friedman@dtn-hera.fairmont.rdhpcs.noaa.gov:/scratch1/NCEPDEV/global/glopara/fix .
receiving file list ... done
fix/sfc_climo/20221017/
fix/sfc_climo/20221017/soil_type.bnu.30s.nc
fix/sfc_climo/20221017/soil_type.statsgo.0.03.nc
fix/sfc_climo/20221017/soil_type.statsgo.0.05.nc
fix/sfc_climo/20221017/soil_type.statsgo.30s.nc
fix/sfc_climo/20221017/soil_type.statsgo.conus.30s.nc
fix/sfc_climo/20221017/soil_type.statsgo.nh.30s.nc
fix/sfc_climo/20221017/vegetation_type.modis.igbp.0.03.nc
fix/sfc_climo/20221017/vegetation_type.modis.igbp.0.05.nc
fix/sfc_climo/20221017/vegetation_type.modis.igbp.30s.nc
fix/sfc_climo/20221017/vegetation_type.modis.igbp.conus.30s.nc
fix/sfc_climo/20221017/vegetation_type.modis.igbp.nh.30s.nc
fix/sfc_climo/20221017/vegetation_type.viirs.igbp.0.03.nc
fix/sfc_climo/20221017/vegetation_type.viirs.igbp.0.05.nc
fix/sfc_climo/20221017/vegetation_type.viirs.igbp.0.1.nc
fix/sfc_climo/20221017/vegetation_type.viirs.igbp.30s.nc
fix/sfc_climo/20221017/vegetation_type.viirs.igbp.conus.30s.nc
fix/sfc_climo/20221017/vegetation_type.viirs.igbp.nh.30s.nc

sent 763,837 bytes  received 193,069,136 bytes  2,378,318.69 bytes/sec
total size is 644,716,354,233  speedup is 3,326.14