Closed GeorgeGayno-NOAA closed 1 year ago
Tried some timing and memory tests on Hera using 4adcb64. I used a small (301x200) CONUS ESG grid and the VIIRS vegetation data.
vegetation_type.viirs.igbp.conus.30s.nc
as input. Using one node and 24 tasks, it ran in about 60 seconds.vegetation_type.viirs.igbp.nh.30s.nc
as input. I had to request the two 'bigmem' nodes, 12 tasks per node. It ran in about six minutes.So, the fractional option can take a lot of memory. We are pushing the limit using the NH version of the file. Using the global version may not be feasible on some machines.
Tried some timing and memory tests on Hera using 4adcb64. I used a small (301x200) CONUS ESG grid and the VIIRS vegetation data.
- Test 1 - use the
vegetation_type.viirs.igbp.conus.30s.nc
as input. Using one node and 24 tasks, it ran in about 60 seconds.- Test 2 - use the
vegetation_type.viirs.igbp.nh.30s.nc
as input. I had to request the two 'bigmem' nodes, 12 tasks per node. It ran in about six minutes.So, the fractional option can take a lot of memory. We are pushing the limit using the NH version of the file. Using the global version may not be feasible on some machines.
I repeated this test on WCOSS2 using the NH file. It was able to run on one node/24 tasks, but I had to request 500 GB of memory: #PBS -l select=1:ncpus=24:mem=500GB
. It took about 6 minutes to run.
When processing a fractional grid (points can be a mix of land and non-land), how will the dominate category be defined? Suppose a point is:
Is the dominate category forest or water?
In order to read the water flag from the file, the file attributes and missing values need to be changed. This may be done using netCDF tools. On WCOSS2, the vegetation type files were changed as follows:
module load gsl
module load nco
ncatted -h -a water_category,vegetation_type,c,i,17 file.nc
ncatted -h -a missing_value,vegetation_type,o,i,17 file.nc
ncap2 -O -s 'where(vegetation_type < 0) vegetation_type=17' file.nc file.nc
When processing a fractional grid (points can be a mix of land and non-land), how will the dominate category be defined? Suppose a point is:
- 40% land
- 60% non-land
- 35% forest
- 5% grassland
Is the dominate category forest or water?
After speaking with @barlage we decided that the dominate category will always refer to a land category. So, in this case, it would be 'forest'.
Encountered a problem when using the NH soil type data. (The global and CONUS soil type data worked fine, and the NH vegetation data worked fine.) A segmentation fault was encountered during the FieldScatter. It happened in the same place on both WCOSS2 and Hera. After consulting with the ESMF team, they discovered a problem in the FieldScatter function. A fix was provided in this tag: https://github.com/esmf-org/esmf/tree/v8.5.0b09
@GeorgeGayno-NOAA based on what I see in the code, the vegetation_type tile output will contain both dominant category and fractional information; is that correct?
@GeorgeGayno-NOAA based on what I see in the code, the vegetation_type tile output will contain both dominant category and fractional information; is that correct?
Both the vegetation type and soil type tile output will contain a record of fractional information and a record of dominate category. Currently, there is a third 'sum' record that sums the fractional values. I use that as a diagnostic to see how close the sum is to 'one'. I will probably comment that out before merging.
Encountered a problem when using the NH soil type data. (The global and CONUS soil type data worked fine, and the NH vegetation data worked fine.) A segmentation fault was encountered during the FieldScatter. It happened in the same place on both WCOSS2 and Hera. After consulting with the ESMF team, they discovered a problem in the FieldScatter function. A fix was provided in this tag: https://github.com/esmf-org/esmf/tree/v8.5.0b09
The new ESMF tag was compiled on Hera: /scratch1/NCEPDEV/da/George.Gayno/noscrub/esmf.git/esmf
and the branch was recompiled. A regional grid was successfully created using soil_type.statsgo.nh.30s.nc
as input.
Will ask the libraries group and EPIC to install the latest ESMF on all officially supported machines.
When processing a fractional grid (points can be a mix of land and non-land), how will the dominate category be defined? Suppose a point is:
- 40% land
- 60% non-land
- 35% forest
- 5% grassland
Is the dominate category forest or water?
After speaking with @barlage we decided that the dominate category will always refer to a land category. So, in this case, it would be 'forest'.
Logic added at 64a9bd0. Tested at a few sample points. Will test for an entire grid next.
Logic added at 64a9bd0 was tested on a C96 uniform grid. Diagnostics at point (42,82) for each were printed to standard output - https://github.com/GeorgeGayno-NOAA/UFS_UTILS/blob/feature/sfc_climo_gen.frac/sorc/sfc_climo_gen.fd/interp_frac_cats.F90#L260
Here is the printout for tile 1-
after rescale 1 7.8082189E-02 0.0000000E+00 4.7417533E-02 0.0000000E+00 0.0000000E+00 2.2303175E-02 2.7782659E-03 2.7923910E-03 2.7908301E-03 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.9219178 0.0000000E+00 0.0000000E+00 0.0000000E+00 dominate cat 1 2.000000
This point is 7.8% land (the first bold number). The predominate vegetation type category (17) is 'water' at 92.2%. The next predominate category (2) is 'forest' at 4.74%. The new logic correctly picks '2' as the dominate category instead of '17'.
This point at tile 4 is 100% water. In this case, the dominate category is correctly selected as 'water' (17):
after rescale 4 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 1.000000 0.0000000E+00 0.0000000E+00 0.0000000E+00 dominate cat 4 17.00000
Encountered a problem when using the NH soil type data. (The global and CONUS soil type data worked fine, and the NH vegetation data worked fine.) A segmentation fault was encountered during the FieldScatter. It happened in the same place on both WCOSS2 and Hera. After consulting with the ESMF team, they discovered a problem in the FieldScatter function. A fix was provided in this tag: https://github.com/esmf-org/esmf/tree/v8.5.0b09
The new ESMF tag was compiled on Hera:
/scratch1/NCEPDEV/da/George.Gayno/noscrub/esmf.git/esmf
and the branch was recompiled. A regional grid was successfully created usingsoil_type.statsgo.nh.30s.nc
as input.Will ask the libraries group and EPIC to install the latest ESMF on all officially supported machines.
For now, go back to the original ESMF library (b19b2f8).
In order to read the water flag from the file, the file attributes and missing values need to be changed. This may be done using netCDF tools. On WCOSS2, the vegetation type files were changed as follows:
module load gsl module load nco ncatted -h -a water_category,vegetation_type,c,i,17 file.nc ncatted -h -a missing_value,vegetation_type,o,i,17 file.nc ncap2 -O -s 'where(vegetation_type < 0) vegetation_type=17' file.nc file.nc
A dump of the header shows the update (ncdump -h vegetation_type.modis.igbp.0.03.nc
):
variables:
float vegetation_type(time, jdim, idim) ;
vegetation_type:landice_category = 15 ;
vegetation_type:missing_value = 17 ;
vegetation_type:water_category = 17 ;
Compared with the OPS file:
float vegetation_type(time, jdim, idim) ;
vegetation_type:landice_category = 15 ;
vegetation_type:missing_value = -999.9f ;
A similar procedure was used on the soil type files (ncdump -h soil_type.statsgo.0.05.nc
):
float soil_type(time, jdim, idim) ;
soil_type:landice_category = 16 ;
soil_type:missing_value = 14 ;
soil_type:water_category = 14 ;
Compared with the OPS file:
float soil_type(time, jdim, idim) ;
soil_type:landice_category = 16 ;
soil_type:missing_value = -999.9f ;
@KateFriedman-NOAA Here is a list of files that were updated and must be stored to the official 'fix' directories. They are on Hera (/scratch1/NCEPDEV/da/George.Gayno/ufs_utils.git/UFS_UTILS/fix/sfc_climo.test
)
soil_type.bnu.30s.nc soil_type.statsgo.0.03.nc soil_type.statsgo.0.05.nc soil_type.statsgo.30s.nc soil_type.statsgo.conus.30s.nc soil_type.statsgo.nh.30s.nc vegetation_type.modis.igbp.0.03.nc vegetation_type.modis.igbp.0.05.nc vegetation_type.modis.igbp.30s.nc vegetation_type.modis.igbp.conus.30s.nc vegetation_type.modis.igbp.nh.30s.nc vegetation_type.viirs.igbp.0.03.nc vegetation_type.viirs.igbp.0.05.nc vegetation_type.viirs.igbp.0.1.nc vegetation_type.viirs.igbp.30s.nc vegetation_type.viirs.igbp.conus.30s.nc vegetation_type.viirs.igbp.nh.30s.nc
This change is backwards compatible - current 'develop' will work with these files. So, you don't need to a new sub-directory under /scratch1/NCEPDEV/global/glopara/fix/sfc_climo
to store them. But that is your choice.
This is not urgent.
@GeorgeGayno-NOAA Related to this, in global-workflow develop
we are still setting sfc_climo_ver=20220805
. We have a newer set (20221017
) where other recent additions for sfc_climo
have gone. Should we change sfc_climo_ver=20221017
in develop
now? Are there any code or scripts updates needed before doing so? I ask because my plan was to add these new sfc_climo
fix files into the newer 20221017
set. Let me know, thanks!
FYI @WalterKolczynski-NOAA
@GeorgeGayno-NOAA Related to this, in global-workflow
develop
we are still settingsfc_climo_ver=20220805
. We have a newer set (20221017
) where other recent additions forsfc_climo
have gone. Should we changesfc_climo_ver=20221017
indevelop
now? Are there any code or scripts updates needed before doing so? I ask because my plan was to add these newsfc_climo
fix files into the newer20221017
set. Let me know, thanks!FYI @WalterKolczynski-NOAA
The files in ./sfc_climo
are used by UFS_UTILS to create new model grids. I don't believe the global-workflow creates grids? But if you want to be consistent, the switch to use 20221017 was made at b67f487. So if you are pointing to a newer snapshot of 'develop', I would switch to the new version.
The files in
./sfc_climo
are used by UFS_UTILS to create new model grids. I don't believe the global-workflow creates grids? But if you want to be consistent, the switch to use 20221017 was made at b67f487. So if you are pointing to a newer snapshot of 'develop', I would switch to the new version.
@GeorgeGayno-NOAA Noted and thanks for that hash. The global-workflow is on a slightly older hash at the moment (https://github.com/ufs-community/UFS_UTILS/commit/8b990c060af2e13f0f5b2ea5b54aa2a7686333c1) so when we move develop
to a hash at or later than https://github.com/ufs-community/UFS_UTILS/commit/b67f487ba94cf534aa8788e2df1bd55a5bc1388c we'll change to sfc_climo_ver=20221017
.
I'll proceed with adding these new fix files into the 20221017
sfc_climo
folder for you now.
Updated sfc_climo
fix files have been copied into the 20221017
fix subfolder. Have rsync'd the files to the fix sets on Hera, Orion, Jet, and both WCOSS2s.
Showing rsync to Orion set as example to confirm updates are in:
Orion-login-2[5] /work/noaa/global/glopara$ rsync -azv --delete-before Kate.Friedman@dtn-hera.fairmont.rdhpcs.noaa.gov:/scratch1/NCEPDEV/global/glopara/fix .
receiving file list ... done
fix/sfc_climo/20221017/
fix/sfc_climo/20221017/soil_type.bnu.30s.nc
fix/sfc_climo/20221017/soil_type.statsgo.0.03.nc
fix/sfc_climo/20221017/soil_type.statsgo.0.05.nc
fix/sfc_climo/20221017/soil_type.statsgo.30s.nc
fix/sfc_climo/20221017/soil_type.statsgo.conus.30s.nc
fix/sfc_climo/20221017/soil_type.statsgo.nh.30s.nc
fix/sfc_climo/20221017/vegetation_type.modis.igbp.0.03.nc
fix/sfc_climo/20221017/vegetation_type.modis.igbp.0.05.nc
fix/sfc_climo/20221017/vegetation_type.modis.igbp.30s.nc
fix/sfc_climo/20221017/vegetation_type.modis.igbp.conus.30s.nc
fix/sfc_climo/20221017/vegetation_type.modis.igbp.nh.30s.nc
fix/sfc_climo/20221017/vegetation_type.viirs.igbp.0.03.nc
fix/sfc_climo/20221017/vegetation_type.viirs.igbp.0.05.nc
fix/sfc_climo/20221017/vegetation_type.viirs.igbp.0.1.nc
fix/sfc_climo/20221017/vegetation_type.viirs.igbp.30s.nc
fix/sfc_climo/20221017/vegetation_type.viirs.igbp.conus.30s.nc
fix/sfc_climo/20221017/vegetation_type.viirs.igbp.nh.30s.nc
sent 763,837 bytes received 193,069,136 bytes 2,378,318.69 bytes/sec
total size is 644,716,354,233 speedup is 3,326.14
Currently, this program outputs a dominate category at each model point. Update the program to output the percentage of each category within a model point.