noaa-ocs-modeling / EnsemblePerturbation

perturbation of coupled model input over a space of input variables
https://ensembleperturbation.readthedocs.io
Creative Commons Zero v1.0 Universal
7 stars 3 forks source link

Memory issue for combining results #111

Open FariborzDaneshvar-NOAA opened 1 year ago

FariborzDaneshvar-NOAA commented 1 year ago

I'm using c5n.18xlarge instance on NHC_COLAB_2 cluster on PW to run combine_results --schism --adcirc-like-output ./analyze command and combine SCHISM outputs, but getting memory error with large files. For example, for a test run of Dorian with 20 ensembles, it failed when writing files:

[2023-08-24 13:24:41,431] parsing.schism  INFO    : writing to "/lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/analyze/perturbations.nc"
[2023-08-24 13:24:41,517] parsing.schism  INFO    : writing to "/lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/analyze/fort.63.nc"
[2023-08-24 13:31:44,604] parsing.schism  INFO    : writing to "/lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/analyze/maxele.63.nc"
[2023-08-24 13:37:25,194] parsing.schism  INFO    : writing to "/lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/analyze/fort.64.nc"
Traceback (most recent call last):
  File "/opt/conda/envs/prep/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/envs/prep/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/scripts/combine_ensemble.py", line 31, in <module>
    main(parser.parse_args())
  File "/scripts/combine_ensemble.py", line 16, in main
    output = combine_results(
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/ensembleperturbation/client/combine_results.py", line 92, in combine_results
    parsed_data = combine_func(
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/ensembleperturbation/parsing/schism.py", line 1332, in convert_schism_output_files_to_adcirc_like
    file_data.to_netcdf(
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/dataset.py", line 2252, in to_netcdf
    return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/backends/api.py", line 1255, in to_netcdf
    writes = writer.sync(compute=compute)
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/backends/common.py", line 256, in sync
    delayed_store = chunkmanager.store(
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/daskmanager.py", line 211, in store
    return store(
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/dask/threaded.py", line 89, in get
    results = get_async(
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/dask/local.py", line 511, in get_async
    raise_exception(exc, tb)
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/dask/local.py", line 319, in reraise
    raise exc
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/dask/local.py", line 224, in execute_task
    result = _execute_task(task, data)
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/indexing.py", line 484, in __array__
    return np.asarray(self.get_duck_array(), dtype=dtype)
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/indexing.py", line 487, in get_duck_array
    return self.array.get_duck_array()
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/indexing.py", line 664, in get_duck_array
    return self.array.get_duck_array()
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/indexing.py", line 557, in get_duck_array
    array = array.get_duck_array()
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/coding/variables.py", line 74, in get_duck_array
    return self.func(self.array.get_duck_array())
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/coding/variables.py", line 215, in _apply_mask
    return np.where(condition, decoded_fill_value, data)
  File "<__array_function__ internals>", line 200, in where
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 3.42 GiB for an array with shape (408, 1126302, 2) and data type float32
ERROR conda.cli.main_run:execute(49): `conda run python -m combine_ensemble --ensemble-dir /lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/ --tracks-dir /lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir//track_files` failed. (See above for error)