scverse / muon

muon is a multimodal omics Python framework
https://muon.scverse.org/
BSD 3-Clause "New" or "Revised" License
218 stars 31 forks source link

raw is also normalized with clr #126

Closed GirayEryilmaz closed 1 year ago

GirayEryilmaz commented 1 year ago

When I save the raw protein counts, and apply clr, the raw counts are also normalized.

Snippet:

adt = pbmc.mod['adt'].copy()
adt.raw = adt

mu.prot.pp.clr(adt)

After this, I expect adt.X to be clr normalized however adt.raw.X to stay the same. Yet, it is also normalized.

gtca commented 1 year ago

This seems to have expected behaviour:

import numpy as np
from anndata import AnnData
from muon import prot as pt

adata = AnnData(np.arange(1000).reshape(-1, 10))
adata.raw = adata

pt.pp.clr(adata)
print(adata.X[:2,:2])
# [[0.         0.00276899]
#  [0.02767339 0.03004515]]
print(adata.raw.X[:2,:2])
# [[ 0.  1.]
#  [10. 11.]]

Any idea how your observation can be reproduced?

GirayEryilmaz commented 1 year ago

Very interesting! Here is my attempt:

import muon as mu
import anndata
import numpy as np

x = np.array([[10, 10],
              [10, 20]], dtype=float)
adata = anndata.AnnData(x)
adata.raw = adata

mu.prot.pp.clr(adata)
print(adata.X)
# [[0.64662716 0.50558292]
#  [0.64662716 0.83979984]]
print(adata.raw.X)
# [[0.64662716 0.50558292]
# [0.64662716 0.83979984]]
print(mu.__version__) # 0.1.5
print(anndata.__version__) # 0.9.2
print(sc.__version__) # 1.9.3

Also I noticed that giving inplace = False solves the problem for me:

x = np.array([[10, 10],
              [10, 20]], dtype=float)
adata = anndata.AnnData(x)
adata.raw = adata

adata = mu.prot.pp.clr(adata, inplace=False)
print(adata.X)
# [[0.64662716 0.50558292]
# [0.64662716 0.83979984]]
print(adata.raw.X)
# [[10. 10.]
# [10. 20.]]

Still I would like to understand what is going on and avoid an unnecessary copying operation. Any suggestions are welcome!

GirayEryilmaz commented 1 year ago

I found the source of the problem.

x = np.array([[10, 20]], dtype=float)
adata = anndata.AnnData(x)

adata.raw = adata

print(adata.X is adata.raw.X) # Prints True

For me, apparently, adata.X IS adata.raw.X. That is why when clr normalizes X, raw.X is also normalized. I don't know if this is the intended behavior of raw and I don't know why I am having this issue whilst @gtca is not.

@gtca would you mind sharing which version of Anndata you are using?

gtca commented 1 year ago

Indeed, in anndata v0.8:

id(adata.raw.X) == id(adata.X)
# => False

whereas in anndata v0.9.2 as well as in anndata v.10:

id(adata.raw.X) == id(adata.X)
# => True

Probably a new issue in the anndata repo is a better place to track this then: https://github.com/scverse/anndata/issues/1139!