tanaylab / metacell

Metacell - Single-cell mRNA Analysis
https://tanaylab.github.io/metacell
Other
108 stars 30 forks source link

How to extract list of genes in a metacell #53

Closed AAleotti closed 3 years ago

AAleotti commented 3 years ago

Hello, I was wondering: is there an easy way to obtain a list of genes expressed in a specific metacell of interest? Any help would be much appreciated! Thanks Alessandra

akhiadber commented 3 years ago

Say metacell 'mc_ind' is the one you are interested in, and assuming your metacell object is called 'mc', you can do something like the following to get the top 20 genes according to their fold change. lfp = log2(mc@mc_fp) tail(sort(lfp[, mc_ind]), 20)

If you wish to look at the genes with highest fractions of umis in a metacell, you can do: egc = log2(1e-5 + mc@e_gc) tail(sort(egc[, mc_ind]), 20)

We will soon be releasing a new metacell pipeline, which will include an interactive viewer with these capabilities.

AAleotti commented 3 years ago

Hi, thank you for your reply - that was very useful! Related to this, is there a way to also filter by mean UMIs per metacell or get a gene list of those that meet a threshold mean UMI? Thanks again, Alessandra

akhiadber commented 3 years ago

The egc above is a proxy of the fractions of umis per gene in a metacell. You could threshold that by some fraction if you want.

If you want the actual mean UMIs, you can use: mat_mcs = mat@mat[,names(mc@mc)] mean_by_mc = tgs_matrix_tapply(mat_mcs, mc@mc, mean)

The columns of mean_by_mc will be genes, the rows metacells, and you can filter per metacell the genes based on some mean UMI threshold.

A note of caution, without focusing on "feature_genes" or some measure of interesting genes, the genes with the highest mean UMIs or fractions of UMIs will always contain housekeeping genes and genes that are highly expressed, but might not be differentially expressed between metacells.

AAleotti commented 3 years ago

Ok, I see. Thanks a lot for your help!