noaa-ocs-modeling / EnsemblePerturbation

perturbation of coupled model input over a space of input variables
https://ensembleperturbation.readthedocs.io
Creative Commons Zero v1.0 Universal
7 stars 3 forks source link

`combine_results` memory issue for large ensemble #114

Open SorooshMani-NOAA opened 9 months ago

SorooshMani-NOAA commented 9 months ago

From https://github.com/noaa-ocs-modeling/EnsemblePerturbation/issues/113#issuecomment-1769523670

The combine_results command failed with MemoryError on compute node with 40 cores! (srun -N 1 -A coastal -n 40 -t 8:00:00 --pty bash).

Here is the full message:

[2023-10-18 22:15:49,072] parsing.schism  INFO    : found 601 run directories with all the specified output patterns
Traceback (most recent call last):
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/bin/combine_results", line 8, in <module>                                                sys.exit(main())
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/ensembleperturbation/client/combine_results.py", line 106, in main
      combine_results(**parse_combine_results())
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/ensembleperturbation/client/combine_results.py", line 92, in combine_results                                                                                                                                                parsed_data = combine_func(
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/ensembleperturbation/parsing/schism.py", line 1308, in convert_schism_output_files_to_adcirc_like                                                                                                                           results = combine_outputs(
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/ensembleperturbation/parsing/schism.py", line 1060, in combine_outputs
      parsed_files = parse_schism_outputs(
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/ensembleperturbation/parsing/schism.py", line 977, in parse_schism_outputs
      dataset = output_class.read_directory(
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/ensembleperturbation/parsing/schism.py", line 744, in read_directory
      dataset = super().read_directory(directory, variables, parallel)
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/ensembleperturbation/parsing/schism.py", line 663, in read_directory
      ds = cls._calc_extermum(full_ds)
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/ensembleperturbation/parsing/schism.py", line 683, in _calc_extermum                                                                                                                                                        arg_extrm_var = getattr(to_extrm_ary, cls.extermum_func)(dim='time').compute()
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/xarray/core/dataarray.py", line 1137, in compute
      return new.load(**kwargs)
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/xarray/core/dataarray.py", line 1111, in load
      ds = self._to_temp_dataset().load(**kwargs)
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/xarray/core/dataset.py", line 833, in load
      evaluated_data = chunkmanager.compute(*lazy_data.values(), **kwargs)
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/xarray/core/daskmanager.py", line 70, in compute
      return compute(*data, **kwargs)
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/dask/base.py", line 621, in compute
      dsk = collections_to_dsk(collections, optimize_graph, **kwargs)
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/dask/base.py", line 394, in collections_to_dsk
      dsk = opt(dsk, keys, **kwargs)
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/dask/array/optimization.py", line 51, in optimize
      dsk = dsk.cull(set(keys))
   File "/scratch2/STI/coastal/Fariborz.Daneshvar/miniconda3/envs/nhc_colab/lib/python3.10/site-packages/dask/highlevelgraph.py", line 763, in cull
      ret_key_deps.update(culled_deps)
MemoryError

I cannot increase no. of cores more than 40! Got this error message for 41: Unable to allocate resources: Requested node configuration is not available