scverse / anndata

Annotated data.
http://anndata.readthedocs.io
BSD 3-Clause "New" or "Revised" License
578 stars 154 forks source link

_inplace_subset_var/_inplace_subset_obs are not actually in place #170

Open joshua-gould opened 5 years ago

joshua-gould commented 5 years ago

The first line of both methods, adata_subset = self[:, index].copy(), makes a full copy of the matrix.

ivirshup commented 5 years ago

There's some ambiguity in the term inplace in scanpy and anndata. In general, I'd take it to mean the AnnData object will be modified without making a copy, and if possible (or we've figured out how) the elements being operated on will be modified inplace.

It might be possible increase how much this is done inplace. Relevant functions:

sparse.spmatrix.resize
np.ndarray.resize
pd.DataFrame.drop

For numpy arrays and sparse matrices, each mentions the operation may involve making a copy, in which case more memory would be allocated. We'd also have to be careful about array contiguity in numpy.

A related option would be to see if we could modify the object by parts, minimizing peak memory usage by not having full and subset copies of everything at once.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!