Open wangjiawen2013 opened 6 years ago
@fidelram are you calling an implicit function summarize_categorical
or something that could be exposed to the user as a tool?
@wangjiawen2013 sc.set_figure_params(vector_friendly=False)
does what you want: https://scanpy.readthedocs.io/en/latest/api/index.html#settings
I have got what I want with the following code adapted from dotplot():
gene_ids = adata.raw.var.index.values clusters = adata.obs['louvain'].cat.categories obs = adata.raw[:,gene_ids].X.toarray() obs = pd.DataFrame(obs,columns=gene_ids,index=adata.obs['louvain']) average_obs = obs.groupby(level=0).mean() obs_bool = obs.astype(bool) fraction_obs = obs_bool.groupby(level=0).sum()/obs_bool.groupby(level=0).count() average_obs.T.to_csv("average.csv") fraction_obs.T.to_csv("fraction.csv")
I could modify dotplot to return this information. Initially, I thought that the data used by dot plot was too ad hoc because the percentage (size of dot) is based on the dropouts, which only is meaningful on the raw matrix. However, I keep finding this information useful to eyeball potential markers expressed only on a single cluster.
I would also be interested in a version which delivers the information shown in the dotplot! Would be extremely useful for automatic cluster annotation.
Yes, Absolutely. Getting back the dotplot summarized information would be great!
Agree. Adding specialized function returning mean expression and percentage of given genes in each cluster will be very useful.
I have got what I want with the following code adapted from dotplot():
gene_ids = adata.raw.var.index.values clusters = adata.obs['louvain'].cat.categories obs = adata.raw[:,gene_ids].X.toarray() obs = pd.DataFrame(obs,columns=gene_ids,index=adata.obs['louvain']) average_obs = obs.groupby(level=0).mean() obs_bool = obs.astype(bool) fraction_obs = obs_bool.groupby(level=0).sum()/obs_bool.groupby(level=0).count() average_obs.T.to_csv("average.csv") fraction_obs.T.to_csv("fraction.csv")
Love this! Thanks a lot!! Just one question, is there a way to get the average expression in different cell types (cluster label 1 ) in different sample (cluster label 2 ) from an integrated object?? to get something roughly like this:
Gene 1 Gene 2
sample1 sample2 sample3 sample1 sample2 sample3 ..... ....... ....
T-cell B-cell ..... .....
I am not sure if this makes sense, but I have been trying to do this for a while and nothing worked!
Dear, Is there a function that returns mean expression and percentage of each gene in a cluster ? scanpy.api.pl.dotplot() includes these information implicitly, so perhaps it's the easiest way to return a table, not only the plot.
By the way, can the plots generated by scanpy be saved as vector graph ? Now the cell points on the plot are not in vector graph format and will be mosaic when amplified, though the letters and axes are in vector format.