correlation between two adata objects

FADHLyemen commented 3 years ago

[x] Additional function parameters / changed functionality / changed defaults?
[x] New analysis tool: A simple analysis tool you have been using and are missing in sc.tools?
[ ] New plotting function: A kind of plot you would like to seein sc.pl?
[ ] External tools: Do you know an existing package that should go into sc.external.*?
[ ] Other?

How to use sc.pl.correlation_matrix to compute correlation between two different anndata? I want to compare urine"data1" with biopsy cells"data2".

...

ivirshup commented 3 years ago

Would this be covered by something like:

combined = sc.concat({"urine": adata1, "biopsy": adata2}, label="source")
sc.pl.correlation_matrix(combined, ...)

FADHLyemen commented 3 years ago

Thank you, because I am interested in celltypes, I made small change sc.pl.correlation_matrix(combined, "celltypes") data1 has different celltypes than data2. so how to make cell types from data1 in rows and cell type data2 in columns. Thank you

ivirshup commented 3 years ago

Ah, I think I see what you're asking now. At the moment, I don't think we have a function for that. But this should be fairly straightforward to work around. Something like this should work:

import scanpy as sc
import numpy as np
import pandas as pd
from sklearn.metrics import pairwise_distances
import seaborn as sns

def groupby_mean(adata, groupby):
    grouped = adata.obs.groupby(groupby)
    results = np.zeros((grouped.ngroups, adata.n_vars), dtype=np.float64)

    for idx, indices in enumerate(grouped.indices.values()):
        results[idx] = np.ravel(adata.X[indices].mean(axis=0))

    return pd.DataFrame(results, columns=adata.var_names, index=grouped.groups.keys())

# Loading data
pbmc_full = sc.datasets.pbmc3k_processed().raw.to_adata()
pbmc_small = sc.datasets.pbmc68k_reduced().raw.to_adata()
var_intersect = pbmc_full.var_names.intersection(pbmc_small.var_names)

# Calculate mean expression per cell type
full_means = groupby_mean(pbmc_full[:, var_intersect], "louvain")
small_means = groupby_mean(pbmc_small[:, var_intersect], "louvain")

# Correlation distance between celltypes
corr_mtx = pd.DataFrame(
    pairwise_distances(full_means, small_means, metric="correlation"),
    index= full_means.index,
    columns=small_means.index,
)

Is this more of what you were thinking?

FADHLyemen commented 3 years ago

Exactly, could you add how to plot the heatmap of corr_mtx? Thank you

scverse / scanpy

correlation between two adata objects #1760