satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.2k stars 894 forks source link

How to add average expression scale to dotplot of merged gene list (plotted onto single dot plot) #4544

Closed ksaunders73 closed 3 years ago

ksaunders73 commented 3 years ago

Hello!

From https://github.com/satijalab/seurat/issues/3521 I learned how to plot a list of genes onto one dot plot:

nklist <- list(c("TFF1", "MB", "ANKRD30B",
             "LINC00173", "DSCAM-AS1", "IGHG1", "SERPINA5"))
sobj <- AddModuleScore(object = sobj, features = nklist, name = "NK_List")  
DotPlot(object = sobj, features = "NK_List1")

image

However, unlike the usual dotplot code, this dot plot does not have the average expression scale added in:

DotPlot(object = sobj, features = c("TFF1", "MB", "ANKRD30B", "LINC00173", "DSCAM-AS1", "IGHG1", "SERPINA5")) + RotatedAxis()

image

Is there a way to add the average expression scale onto the merged dot plot also?

Hopefully this makes sense, and thanks for reading!

samuel-marsh commented 3 years ago

Hi Again,

Not member of dev team but hopefully can be helpful. In my opinion DotPlot is probably not the best tool for visualization of module scores. There are two issues with plotting module scores via a DotPlot that are of particular note.

So in basic sense as described in manual AddModuleScore is doing the following: Calculate the average expression levels of each program (cluster) on single cell level, subtracted by the aggregated expression of control feature sets. All analyzed features are binned based on averaged expression, and the control features are randomly selected from each bin.

So each cell has a score (in very basic sense positive scores are enriched compared to the randomly selected control gene set and vice-versa for negative scores). Because the code that DotPlot uses to determine % expression has threshold of 0 it is counting the % expressed as those only with positive module scores. However, just having a positive module score doesn't necessarily mean the enrichment is statistically significant and thus % expressing is a bit deceiving.

Due to the nature of module scores I think in terms of visualization of them on using DotPlot also becomes problematic for average expression metric (which anyways is normally based on scaled expression data which is not the case for module score). The distribution of scores I think are often more relevant personally.

Overall in my opinion I think the more relevant plotting functions that you could do other than simply overlaying on tSNE/UMAP are VlnPlot and/or FeatureScatter. Here is example from my own data with a score using core signature of microglial genes with VlnPlot: image

Or the same microglia score plotted against and artificial score using FeatureScatter (See Figure 1H)

You can also run stats independently on the module score (see #3719) to compare between different populations or experimental conditions. Can easily be done using wilcox.test function of base R.

Best, Sam

ksaunders73 commented 3 years ago

Hello again @samuel-marsh ! Thanks so much again for your help in getting me to a better understanding of all of these!