Monstrous computational resources required to run sfc_climo_gen

kaiyuan-cheng commented 3 months ago

For sfc_climo_gen, 30 MPI processes seem to be the minimum requirement and the memory footprint is at least 150 GB. The substantial memory usage may be due to each MPI process having a copy of the external file in memory. This high demand for MPI processes and memory makes running UFS_UTIL on non-HPC systems nearly impossible. Is it possible to improve the computational efficiency of sfc_climo_gen?

P.S. chgres_cube, the code of which appears similar to sfc_climo_gen, can run with just 6 MPI processes.

GeorgeGayno-NOAA commented 3 months ago

Which external file is being held in memory on each MPI task?

kaiyuan-cheng commented 3 months ago

I was talking about the climatological datasets located in fix/sfc_climo. For example, the size of snowfree_albedo.4comp.0.05.nc is 4.7 GB. When multiplied by 30, the total size would be approximately 150 GB.

GeorgeGayno-NOAA commented 3 months ago

The climo datasets are read in on one MPI task, then a subsection is scattered to all tasks.

Here, the array that holds climo data is only allocated on task '0': https://github.com/ufs-community/UFS_UTILS/blob/47705d5315013c89841cf3645d549e9bc83ce6e8/sorc/sfc_climo_gen.fd/interp.F90#L77

Then, the climo data is read in on task '0', then chopped up and scattered to all tasks: https://github.com/ufs-community/UFS_UTILS/blob/47705d5315013c89841cf3645d549e9bc83ce6e8/sorc/sfc_climo_gen.fd/interp.F90#L108

kaiyuan-cheng commented 3 months ago

You are right. My initial speculation about the substantial memory usage was wrong. I did a memory usage profiling for a global C48 grid with different numbers of MPI tasks, ranging from 30 to 60. The memory usage remained around 175GB regardless of the number of MPI tasks. Therefore, the memory issue must be caused by something else.

GeorgeGayno-NOAA commented 3 months ago

How are you configuring the run? Can I see the fort.41 namelist?

kaiyuan-cheng commented 3 months ago

&config input_facsf_file="/autofs/ncrc-svm1_home2/Kai-yuan.Cheng/software/UFS_UTILS/driver_scripts/../fix/sfc_climo/facsf.1.0.nc" input_substrate_temperature_file="/autofs/ncrc-svm1_home2/Kai-yuan.Cheng/software/UFS_UTILS/driver_scripts/../fix/sfc_climo/substrate_temperature.gfs.0.5.nc" input_maximum_snow_albedo_file="/autofs/ncrc-svm1_home2/Kai-yuan.Cheng/software/UFS_UTILS/driver_scripts/../fix/sfc_climo/maximum_snow_albedo.0.05.nc" input_snowfree_albedo_file="/autofs/ncrc-svm1_home2/Kai-yuan.Cheng/software/UFS_UTILS/driver_scripts/../fix/sfc_climo/snowfree_albedo.4comp.0.05.nc" input_slope_type_file="/autofs/ncrc-svm1_home2/Kai-yuan.Cheng/software/UFS_UTILS/driver_scripts/../fix/sfc_climo/slope_type.1.0.nc" input_soil_type_file="/autofs/ncrc-svm1_home2/Kai-yuan.Cheng/software/UFS_UTILS/driver_scripts/../fix/sfc_climo/soil_type.bnu.v3.30s.nc" input_soil_color_file="/autofs/ncrc-svm1_home2/Kai-yuan.Cheng/software/UFS_UTILS/driver_scripts/../fix/sfc_climo/soil_color.clm.0.05.nc" input_vegetation_type_file="/autofs/ncrc-svm1_home2/Kai-yuan.Cheng/software/UFS_UTILS/driver_scripts/../fix/sfc_climo/vegetation_type.viirs.v3.igbp.30s.nc" input_vegetation_greenness_file="/autofs/ncrc-svm1_home2/Kai-yuan.Cheng/software/UFS_UTILS/driver_scripts/../fix/sfc_climo/vegetation_greenness.0.144.nc" mosaic_file_mdl="/gpfs/f5/gfdl_w/world-shared/Kai-yuan.Cheng/my_grids/C48/C48_mosaic.nc" orog_dir_mdl="/gpfs/f5/gfdl_w/world-shared/Kai-yuan.Cheng/my_grids/C48" orog_files_mdl="C48_oro_data.tile1.nc","C48_oro_data.tile2.nc","C48_oro_data.tile3.nc","C48_oro_data.tile4.nc","C48_oro_data.tile5.nc","C48_oro_data.tile6.nc" halo=0 maximum_snow_albedo_method="bilinear" snowfree_albedo_method="bilinear" vegetation_greenness_method="bilinear" fract_vegsoil_type=.false. /

GeorgeGayno-NOAA commented 3 months ago

I see you are using the 30-sec soil and vegetation type datasets. They are quite large. There are lower-res versions of the soil and veg data. Can you use those?

input_vegetation_type_file="vegetation_type.modis.igbp.0.05.nc"
input_soil_type_file="soil_type.statsgo.0.05.nc"

kaiyuan-cheng commented 3 months ago

In this case, the memory usage decreases to 33GB, which is somewhat manageable for non-HPC systems. However, 30-sec datasets should not have such a large memory footprint. Assuming a single-precision floating-point variable, the array storing the entire 30-sec dataset should be merely 3.5 GB (21600 43200 4 bytes). The overhead of sfc_climo_gen seems excessively high.

GeorgeGayno-NOAA commented 3 months ago

I would guess the ESMF regridding is using a lot of memory. I can contact the ESMF team and provide them your test case. They may have suggestions to reduce the memory requirements.

kaiyuan-cheng commented 3 months ago

Sounds good. Thank you for working on this. I also found that when using lower-res soil and veg data, sfc_climo_gen can run with just 6 MPI tasks. It appears that higher resolution datasets require more MPI tasks, which could be a limitation also related to ESMF.

ufs-community / UFS_UTILS

Monstrous computational resources required to run sfc_climo_gen #974