satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.25k stars 904 forks source link

Basic question about Interpreting VlnPlot and DotPlot results #7200

Closed Smeerlap closed 1 year ago

Smeerlap commented 1 year ago

Hi everyone,

I am a medical doctor getting started with scRNAseq analysis and have a probably very basic question about interpreting the results of VlnPlots and DotPlots. My data has four celltypes from two conditions (wildtype vs genetic mutation). If I plot the expression of two genes (Insr and Igf1r) split by the genotype I got plots that confused me a little bit (in the VlnPlots, the left column is from mutant mice, the right one from wildtype animals).

Test Test2

The way I understand it, the VlnPlot tells me that the expression of Igf1r is roughly the same in podocytes from both conditions, probably a little higher in the wildtype animals. The expression level does seem to be higher, though, than in immune cells. But when I plot the DotPlot (default parameters, split.by = "gtype"), the difference in expression levels of Igf1r in podocytes split by the two conditions seems to be a lot.

Could someone help me out explaning this to me?

Thanks a lot and best regards,

Jasper

nathan-nakatsuka commented 1 year ago

Dear @JasperNies, it would be helpful if you have all of the legends in these plots (for the dotplots, the average expression is the intensity of the blueness and there usually is a scale that is present that is not in your plots). The dot plots are showing that there is more expression of Igf1r in the cells that express them in the mutant (nphs2) but there are more cells (bigger dot) in the podocytes that express igfr1. This overall means that the overall expression of IGFR1 might not be a lot different between the two since these two factors balance out, which is why you don't observe a huge difference in the violin plots (I'm assuming for the violin plots they are labeled with turquoise as WT and red as mutant). You can see these effects more rigorously using FindMarkers (e.g. use DESeq2). It would be ideal to pseudobulk the cells by cell type (leaving only differences between the individuals) and then do differential expression markers with that data.