Open ivirshup opened 2 weeks ago
Code:
%load_ext memory_profiler import h5py from anndata.experimental import write_elem import numpy as np f = h5py.File("tmp.h5", "w") X = np.ones((10_000, 10_000)) %memit write_elem(f, "X", X) # peak memory: 940.14 MiB, increment: 0.00 MiB %memit write_elem(f, "X2", f["X"]) # peak memory: 1702.89 MiB, increment: 762.75 MiB
The second write doubles the amount of memory. We can move to a chunked approach to writing pretty easily from the solution suggested here:
dst_ds = f.create_dataset_like('dst', src_ds, dtype=np.int64) for chunk in src_ds.iter_chunks(): dst_ds[chunk] = src_ds[chunk]
----- IPython 8.26.0 anndata 0.11.0.dev168+g8cc5a18 h5py 3.11.0 numpy 1.26.4 session_info 1.0.0 ----- asciitree NA asttokens NA bottleneck 1.4.0 cloudpickle 3.0.0 cython_runtime NA dask 2024.8.1 dateutil 2.9.0.post0 decorator 5.1.1 executing 2.0.1 importlib_metadata NA jedi 0.19.1 jinja2 3.1.4 markupsafe 2.1.5 memory_profiler 0.61.0 msgpack 1.0.8 natsort 8.4.0 numcodecs 0.13.0 numexpr 2.10.1 packaging 24.1 pandas 2.2.1 parso 0.8.4 prompt_toolkit 3.0.47 psutil 5.9.8 pure_eval 0.2.2 pyarrow 15.0.2 pygments 2.18.0 pytz 2024.1 scipy 1.12.0 setuptools 70.3.0 six 1.16.0 stack_data 0.6.3 tblib 3.0.0 tlz 0.12.1 toolz 0.12.1 traitlets 5.14.3 typing_extensions NA wcwidth 0.2.13 yaml 6.0.1 zarr 2.18.2 zipp NA ----- Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] Linux-6.8.0-1010-aws-x86_64-with-glibc2.39 ----- Session information updated at 2024-08-28 22:36
Some complications:
iter_chunks
Please make sure these conditions are met
Report
Code:
The second write doubles the amount of memory. We can move to a chunked approach to writing pretty easily from the solution suggested here:
Versions