scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.9k stars 597 forks source link

log1p warns adata.X is logged when it may not be (when other layers are logged) #1333

Open gheimberg opened 4 years ago

gheimberg commented 4 years ago

When I use sc.pp.log1p(adata) and then sc.pp.log1p(adata, layer='other') it warns me that the data has already been logged even though I am logging a layer as opposed to adata.X.

Would be nice to flag logging for each layer instead of when anything is logged.

import scanpy as sc

adata = sc.datasets.pbmc3k_processed()
adata.layers['other'] = adata.X
sc.pp.log1p(adata, layer='other')
sc.pp.log1p(adata)
WARNING: adata.X seems to be already log-transformed.

Versions:

scanpy==1.5.2.dev5+ge5d246aa anndata==0.7.3 umap==0.3.10 numpy==1.18.5 scipy==1.5.0 pandas==1.0.5 scikit-learn==0.23.1 statsmodels==0.11.1 python-igraph==0.7.1 louvain==0.6.1 leidenalg==0.7.0

flying-sheep commented 4 years ago

Happens here:

https://github.com/theislab/scanpy/blob/3558a42e747856cbf55c4d118566a155c6717178/scanpy/preprocessing/_simple.py#L286-L287

Where does .uns['log1p'] get set other than there?

LuckyMD commented 4 years ago

Hi @gheimberg,

In your example you are not using a deepcopy to assign adata.X to adata.layers['other']. So when you log transform the data in the layer, it automatically log transforms the data in adata.X as well, as you just passed the reference. That being said, this is still a bug as even with a adata.X.copy() the warning is given.

gokceneraslan commented 4 years ago

Guys we should just keep the layer info here in log1p:

data.uns['log1p'] = {'base': base}

like

data.uns['log1p'][layer] = {'base': base}

Benfeitas commented 2 years ago

I've come across a strange behavior related with this issue. Depending on whether or not I save the object I get the same warning as OP.

This works as it should:

import scanpy as sc

adata=sc.read_h5ad(data_dir+'scanpy_QC_sexchrom.h5ad')
adata.raw=adata.copy() #data to save
sc.pp.log1p(adata) # logaritmize

### Test 1, no saving, works as it should
adata=adata.raw.to_adata()
sc.pp.log1p(adata)
##>>> no warning

Saving mid-way does not allow to avoid the warning, even restarting the kernel before reading the data:

import scanpy as sc

## same as above
adata=sc.read_h5ad(data_dir+'scanpy_QC_sexchrom.h5ad')
adata.raw=adata.copy() #data to save
sc.pp.log1p(adata) # logaritmize

### Test 2, saving and re-assigning from raw
### saving object, reading, testing again
### Doesnt work
adata.write_h5ad(tmp+'scanpy_test.h5ad')
adata=sc.read_h5ad(tmp+'scanpy_test.h5ad')
adata=adata.raw.to_adata()
sc.pp.log1p(adata)
###>>>WARNING: adata.X seems to be already log-transformed.

I'm on scanpy 1.9.1 if it matters

Benfeitas commented 2 years ago

I must also mention that upon reading in the data: