scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.8k stars 581 forks source link

correlation between two adata objects #1760

Open FADHLyemen opened 3 years ago

FADHLyemen commented 3 years ago

How to use sc.pl.correlation_matrix to compute correlation between two different anndata? I want to compare urine"data1" with biopsy cells"data2".

...

ivirshup commented 3 years ago

Would this be covered by something like:

combined = sc.concat({"urine": adata1, "biopsy": adata2}, label="source")
sc.pl.correlation_matrix(combined, ...)
FADHLyemen commented 3 years ago

Thank you, because I am interested in celltypes, I made small change sc.pl.correlation_matrix(combined, "celltypes") data1 has different celltypes than data2. so how to make cell types from data1 in rows and cell type data2 in columns. Thank you

ivirshup commented 3 years ago

Ah, I think I see what you're asking now. At the moment, I don't think we have a function for that. But this should be fairly straightforward to work around. Something like this should work:

import scanpy as sc
import numpy as np
import pandas as pd
from sklearn.metrics import pairwise_distances
import seaborn as sns

def groupby_mean(adata, groupby):
    grouped = adata.obs.groupby(groupby)
    results = np.zeros((grouped.ngroups, adata.n_vars), dtype=np.float64)

    for idx, indices in enumerate(grouped.indices.values()):
        results[idx] = np.ravel(adata.X[indices].mean(axis=0))

    return pd.DataFrame(results, columns=adata.var_names, index=grouped.groups.keys())

# Loading data
pbmc_full = sc.datasets.pbmc3k_processed().raw.to_adata()
pbmc_small = sc.datasets.pbmc68k_reduced().raw.to_adata()
var_intersect = pbmc_full.var_names.intersection(pbmc_small.var_names)

# Calculate mean expression per cell type
full_means = groupby_mean(pbmc_full[:, var_intersect], "louvain")
small_means = groupby_mean(pbmc_small[:, var_intersect], "louvain")

# Correlation distance between celltypes
corr_mtx = pd.DataFrame(
    pairwise_distances(full_means, small_means, metric="correlation"),
    index= full_means.index,
    columns=small_means.index,
)

Is this more of what you were thinking?

FADHLyemen commented 3 years ago

Exactly, could you add how to plot the heatmap of corr_mtx? Thank you