Closed GirayEryilmaz closed 1 year ago
This seems to have expected behaviour:
import numpy as np
from anndata import AnnData
from muon import prot as pt
adata = AnnData(np.arange(1000).reshape(-1, 10))
adata.raw = adata
pt.pp.clr(adata)
print(adata.X[:2,:2])
# [[0. 0.00276899]
# [0.02767339 0.03004515]]
print(adata.raw.X[:2,:2])
# [[ 0. 1.]
# [10. 11.]]
Any idea how your observation can be reproduced?
Very interesting! Here is my attempt:
import muon as mu
import anndata
import numpy as np
x = np.array([[10, 10],
[10, 20]], dtype=float)
adata = anndata.AnnData(x)
adata.raw = adata
mu.prot.pp.clr(adata)
print(adata.X)
# [[0.64662716 0.50558292]
# [0.64662716 0.83979984]]
print(adata.raw.X)
# [[0.64662716 0.50558292]
# [0.64662716 0.83979984]]
print(mu.__version__) # 0.1.5
print(anndata.__version__) # 0.9.2
print(sc.__version__) # 1.9.3
Also I noticed that giving inplace = False solves the problem for me:
x = np.array([[10, 10],
[10, 20]], dtype=float)
adata = anndata.AnnData(x)
adata.raw = adata
adata = mu.prot.pp.clr(adata, inplace=False)
print(adata.X)
# [[0.64662716 0.50558292]
# [0.64662716 0.83979984]]
print(adata.raw.X)
# [[10. 10.]
# [10. 20.]]
Still I would like to understand what is going on and avoid an unnecessary copying operation. Any suggestions are welcome!
I found the source of the problem.
x = np.array([[10, 20]], dtype=float)
adata = anndata.AnnData(x)
adata.raw = adata
print(adata.X is adata.raw.X) # Prints True
For me, apparently, adata.X IS adata.raw.X. That is why when clr normalizes X, raw.X is also normalized. I don't know if this is the intended behavior of raw and I don't know why I am having this issue whilst @gtca is not.
@gtca would you mind sharing which version of Anndata you are using?
Indeed, in anndata v0.8:
id(adata.raw.X) == id(adata.X)
# => False
whereas in anndata v0.9.2 as well as in anndata v.10:
id(adata.raw.X) == id(adata.X)
# => True
Probably a new issue in the anndata repo is a better place to track this then: https://github.com/scverse/anndata/issues/1139!
When I save the raw protein counts, and apply clr, the raw counts are also normalized.
Snippet:
After this, I expect adt.X to be clr normalized however adt.raw.X to stay the same. Yet, it is also normalized.