I'm using c5n.18xlarge instance on NHC_COLAB_2 cluster on PW to run combine_results --schism --adcirc-like-output ./analyze command and combine SCHISM outputs, but getting memory error with large files.
For example, for a test run of Dorian with 20 ensembles, it failed when writing files:
[2023-08-24 13:24:41,431] parsing.schism INFO : writing to "/lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/analyze/perturbations.nc"
[2023-08-24 13:24:41,517] parsing.schism INFO : writing to "/lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/analyze/fort.63.nc"
[2023-08-24 13:31:44,604] parsing.schism INFO : writing to "/lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/analyze/maxele.63.nc"
[2023-08-24 13:37:25,194] parsing.schism INFO : writing to "/lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/analyze/fort.64.nc"
Traceback (most recent call last):
File "/opt/conda/envs/prep/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/envs/prep/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/scripts/combine_ensemble.py", line 31, in <module>
main(parser.parse_args())
File "/scripts/combine_ensemble.py", line 16, in main
output = combine_results(
File "/opt/conda/envs/prep/lib/python3.9/site-packages/ensembleperturbation/client/combine_results.py", line 92, in combine_results
parsed_data = combine_func(
File "/opt/conda/envs/prep/lib/python3.9/site-packages/ensembleperturbation/parsing/schism.py", line 1332, in convert_schism_output_files_to_adcirc_like
file_data.to_netcdf(
File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/dataset.py", line 2252, in to_netcdf
return to_netcdf( # type: ignore # mypy cannot resolve the overloads:(
File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/backends/api.py", line 1255, in to_netcdf
writes = writer.sync(compute=compute)
File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/backends/common.py", line 256, in sync
delayed_store = chunkmanager.store(
File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/daskmanager.py", line 211, in store
return store(
File "/opt/conda/envs/prep/lib/python3.9/site-packages/dask/threaded.py", line 89, in get
results = get_async(
File "/opt/conda/envs/prep/lib/python3.9/site-packages/dask/local.py", line 511, in get_async
raise_exception(exc, tb)
File "/opt/conda/envs/prep/lib/python3.9/site-packages/dask/local.py", line 319, in reraise
raise exc
File "/opt/conda/envs/prep/lib/python3.9/site-packages/dask/local.py", line 224, in execute_task
result = _execute_task(task, data)
File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/indexing.py", line 484, in __array__
return np.asarray(self.get_duck_array(), dtype=dtype)
File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/indexing.py", line 487, in get_duck_array
return self.array.get_duck_array()
File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/indexing.py", line 664, in get_duck_array
return self.array.get_duck_array()
File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/indexing.py", line 557, in get_duck_array
array = array.get_duck_array()
File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/coding/variables.py", line 74, in get_duck_array
return self.func(self.array.get_duck_array())
File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/coding/variables.py", line 215, in _apply_mask
return np.where(condition, decoded_fill_value, data)
File "<__array_function__ internals>", line 200, in where
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 3.42 GiB for an array with shape (408, 1126302, 2) and data type float32
ERROR conda.cli.main_run:execute(49): `conda run python -m combine_ensemble --ensemble-dir /lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/ --tracks-dir /lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir//track_files` failed. (See above for error)
I'm using
c5n.18xlarge
instance onNHC_COLAB_2
cluster on PW to runcombine_results --schism --adcirc-like-output ./analyze
command and combine SCHISM outputs, but getting memory error with large files. For example, for a test run of Dorian with 20 ensembles, it failed when writing files: