scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.83k stars 588 forks source link

pp.normalize_geometric(protein) #1208

Closed shendong124 closed 4 years ago

shendong124 commented 4 years ago

Hi Scanpy team, I am trying to analyse CTE-seq data. At the nomalization step of the protein data, the attibute normalize_geometric is not recognize. Could this be a version issue? Thank you for your help!

sc.pp.normalize_geometric(protein)

<!-- Put your Error output in this code block (if applicable, else delete the block): -->
```pytb
...AttributeError                            Traceback (most recent call last)
<ipython-input-80-db93ca6d0f1d> in <module>
----> 1 sc.pp.normalize_geometric(protein)

AttributeError: module 'scanpy.preprocessing' has no attribute 'normalize_geometric'

Versions:

scanpy==1.4.7.dev30+g668b6776 anndata==0.7.1 umap==0.3.10 numpy==1.16.2 scipy==1.3.0 pandas==0.24.2 scikit-learn==0.22.2.post1 statsmodels==0.10.1 python-igraph==0.7.1 louvain==0.6.1

LuckyMD commented 4 years ago

Hi, I'm not sure where you found the function normalize_geometric(), but Scanpy's inbuilt normalization is called sc.pp.normalize_total(). You can find the documentation here: https://scanpy.readthedocs.io/en/stable/api/scanpy.pp.normalize_total.html

shendong124 commented 4 years ago

I found it in the CITEseq tutorial

https://scanpy-tutorials.readthedocs.io/en/multiomics/cite-seq/pbmc5k.html

Le mar. 12 mai 2020 à 03:59, MalteDLuecken notifications@github.com a écrit :

Closed #1208 https://github.com/theislab/scanpy/issues/1208.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/theislab/scanpy/issues/1208#event-3326937703, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG6NOAWOLOUUOMFRCYB4KMLRRETYZANCNFSM4M6NCTKQ .

LuckyMD commented 4 years ago

@ivirshup where did you get this function from?

maximz commented 4 years ago

@shendong124 @ivirshup I assume normalize_geometric was intended to be similar to Seurat's centered log ratio transformation, which is implemented as follows in R: log1p(x = x / (exp(x = sum(log1p(x = x[x > 0]), na.rm = TRUE) / length(x = x)))). This is CLR with some safeguards for 0 counts.

Here's a reimplementation of the Seurat CLR transformation for scanpy. Call this with clr_normalize_each_cell(adata):

def clr_normalize_each_cell(adata, inplace=True):
    """Normalize count vector for each cell, i.e. for each row of .X"""

    import numpy as np
    import scipy

    def seurat_clr(x):
        # TODO: support sparseness
        s = np.sum(np.log1p(x[x > 0]))
        exp = np.exp(s / len(x))
        return np.log1p(x / exp)

    if not inplace:
        adata = adata.copy()

    # apply to dense or sparse matrix, along axis. returns dense matrix
    adata.X = np.apply_along_axis(
        seurat_clr, 1, (adata.X.A if scipy.sparse.issparse(adata.X) else adata.X)
    )
    return adata
maximz commented 4 years ago

Actually there's a nice ongoing thread about this at #1117

andreas-wilm commented 3 years ago

normalize_geometric() is still mentioned in the tutorial at https://scanpy-tutorials.readthedocs.io/en/multiomics/cite-seq/pbmc5k.html