scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.9k stars 597 forks source link

mean expression and percentage #336

Open wangjiawen2013 opened 5 years ago

wangjiawen2013 commented 5 years ago

Dear, Is there a function that returns mean expression and percentage of each gene in a cluster ? scanpy.api.pl.dotplot() includes these information implicitly, so perhaps it's the easiest way to return a table, not only the plot.

By the way, can the plots generated by scanpy be saved as vector graph ? Now the cell points on the plot are not in vector graph format and will be mosaic when amplified, though the letters and axes are in vector format.

falexwolf commented 5 years ago

@fidelram are you calling an implicit function summarize_categorical or something that could be exposed to the user as a tool?

@wangjiawen2013 sc.set_figure_params(vector_friendly=False) does what you want: https://scanpy.readthedocs.io/en/latest/api/index.html#settings

wangjiawen2013 commented 5 years ago

I have got what I want with the following code adapted from dotplot():

gene_ids = adata.raw.var.index.values clusters = adata.obs['louvain'].cat.categories obs = adata.raw[:,gene_ids].X.toarray() obs = pd.DataFrame(obs,columns=gene_ids,index=adata.obs['louvain']) average_obs = obs.groupby(level=0).mean() obs_bool = obs.astype(bool) fraction_obs = obs_bool.groupby(level=0).sum()/obs_bool.groupby(level=0).count() average_obs.T.to_csv("average.csv") fraction_obs.T.to_csv("fraction.csv")

fidelram commented 5 years ago

I could modify dotplot to return this information. Initially, I thought that the data used by dot plot was too ad hoc because the percentage (size of dot) is based on the dropouts, which only is meaningful on the raw matrix. However, I keep finding this information useful to eyeball potential markers expressed only on a single cluster.

lisbeth-dot-95 commented 4 years ago

I would also be interested in a version which delivers the information shown in the dotplot! Would be extremely useful for automatic cluster annotation.

alevax commented 4 years ago

Yes, Absolutely. Getting back the dotplot summarized information would be great!

QiangShiPKU commented 2 years ago

Agree. Adding specialized function returning mean expression and percentage of given genes in each cluster will be very useful.

Qtasnim commented 1 year ago

I have got what I want with the following code adapted from dotplot():

gene_ids = adata.raw.var.index.values clusters = adata.obs['louvain'].cat.categories obs = adata.raw[:,gene_ids].X.toarray() obs = pd.DataFrame(obs,columns=gene_ids,index=adata.obs['louvain']) average_obs = obs.groupby(level=0).mean() obs_bool = obs.astype(bool) fraction_obs = obs_bool.groupby(level=0).sum()/obs_bool.groupby(level=0).count() average_obs.T.to_csv("average.csv") fraction_obs.T.to_csv("fraction.csv")

Love this! Thanks a lot!! Just one question, is there a way to get the average expression in different cell types (cluster label 1 ) in different sample (cluster label 2 ) from an integrated object?? to get something roughly like this:

                     Gene 1                                            Gene 2 
          sample1   sample2   sample3     sample1   sample2    sample3 ..... ....... ....

T-cell B-cell ..... .....

I am not sure if this makes sense, but I have been trying to do this for a while and nothing worked!