scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.93k stars 603 forks source link

scanpy.pp.log1p with backed h5ad produces copy error #1153

Open adkinsrs opened 4 years ago

adkinsrs commented 4 years ago

I am fairly new with using scanpy, and so I may be performing this incorrectly. I encountered an error when trying to create a backed AnnData object from an h5ad file, and then logarithmizing the data matrix within the object using scanpy.pp.log1p. However I get an error within the AnnData object code because the preprocessing/_simple.py script is not passing a filename in the copy() function.

Right now my current workaround is to create the AnnData object as non-backed, do the log1p, and then create a "filename" property to the AnnData object afterwards to make it backed for other scanpy functions.

Example

import scanpy as sc

dataset_path = "/path/to/test/data.h5ad"   # Subbing out actual filenames for data
adata = sc.read_h5ad(dataset_path, backed='r')
print(adata)   # To ensure there is a backed filepath

adata.raw = sc.pp.log1p(adata, copy=True)    # Error is here

Error output

# I printed the AnnData object to ensure it was backed
AnnData object with n_obs × n_vars = 4166 × 16852 backed at '/tmp/1b12dde9-1762-7564-8fbd-1b07b750505f.h5ad'
    obs: 'cell_type', 'barcode', 'tSNE_1', 'tSNE_2', 'replicate', 'louvain', 'n_genes', 'percent_mito', 'n_counts'
    var: 'gene_symbol', 'n_cells'
    obsm: 'X_tsne'

# Actual error after calling log1p
Traceback (most recent call last):
  File "log1p_test.cgi", line 129, in <module>
    main()
  File "log1p_test.cgi", line 81, in main
    adata.raw = sc.pp.log1p(adata, copy=True)
  File "/opt/Python-3.7.3/lib/python3.7/site-packages/scanpy/preprocessing/_simple.py", line 292, in log1p
    data = data.copy()
  File "/opt/Python-3.7.3/lib/python3.7/site-packages/anndata/_core/anndata.py", line 1457, in copy
    "To copy an AnnData object in backed mode, "
ValueError: To copy an AnnData object in backed mode, pass a filename: `.copy(filename='myfilename.h5ad')`.

Versions:

scanpy==1.4.6 anndata==0.7.1 umap==0.3.10 numpy==1.16.3 scipy==1.4.1 pandas==0.24.2 scikit-learn==0.22.2.post1 statsmodels==0.11.1 python-igraph==0.8.0 louvain==0.6.1

ivirshup commented 4 years ago

Thanks for the report! I think I see underlying issue, but can't promise a quick fix.

As a heads up, at the moment, backed mode works best for read only workflows like plotting.