ncborcherding / escape

Easy single cell analysis platform for enrichment
https://www.borch.dev/uploads/screpertoire/articles/running_escape
MIT License
146 stars 20 forks source link

Question regarding enrichIt #38

Closed rythi4 closed 2 years ago

rythi4 commented 2 years ago

I have a question regarding the output of enrichIt(). This type of analysis is new to me, so maybe the question is quite simple: I tried to run enrichIt on a seurat object (13 clusters) with two genesets and then afterwards with only one of the gene sets. When I look at the enrichment scores for the geneset I tried twice, they are not the same (although the overall picture is the same), and I am not sure why? Does some sort of normalization over the genesets take place when running enrichIt() and can it explain the difference I see?

ncborcherding commented 2 years ago

Hey rythi4,

Would you mind sharing an example?

If you are using a single-cell object to run enrichIt(), the function is pulling the raw counts from the "RNA" assay and calculates the enrichment score using a Poisson distribution in GSVA, and performs the ssGSEA normalization described by Barbie et al 2009. This normalization uses the absolute difference between a given gene set - so it is occurring on a gene.set-level.

So if the composition of the seurat object changed, in terms of the cell number, there might be a difference in the output values. But if it is the same Seurat object, it should result in the same values.

Nick

rythi4 commented 2 years ago

Here is an example:

`

Making GeneSetCollections:

set1 <- GeneSetCollection(GeneSet(genes[["X"]][["gene"]], setName = "x"), GeneSet(genes[["Y"]][["gene"]], setName = "y"), GeneSet(genes[["Z"]][["gene"]], setName = "z"), GeneSet(genes[["W"]][["gene"]], setName = "w")) set2 <- GeneSetCollection(GeneSet(genes[["X"]][["gene"]], setName = "x"), GeneSet(genes[["Y"]][["gene"]], setName = "y"))

Here I define the seurat_object: seurat_object <- XXX

DefaultAssay(seurat_object) <- "RNA" seurat_object_1 <- UpdateSeuratObject(seurat_object) ES.seurat_1 <- enrichIt(obj = seurat_object_1, gene.sets = set1, groups = 1000)

Here I define the seurat_object: seurat_object <- XXX

DefaultAssay(seurat_object) <- "RNA" seurat_object_1 <- UpdateSeuratObject(seurat_object) ES.seurat_2 <- enrichIt(obj = seurat_object_1, gene.sets = set2, groups = 1000) ` The two seurat_objects are defined by the exact same line of code.

When I look at ES.seurat_1 and ES.seurat_2 (I order the columns 'x' by highest value for easy comparison), I see that the highest value is not the same: image

ncborcherding commented 2 years ago

Thank you so much for the reproducible example!!

I found the issue - it lies in the GSVA package itself

Screen Shot 2022-01-18 at 12 40 06 PM

I think this might be an error in my reading of the original manuscript for ssGSEA. I will take a look, I might have to implement an intrinsic normalization so the enrichment scores are deterministic. I'll post here when I get something working.

Thanks for bringing this to my attention.

Nick

rythi4 commented 2 years ago

Of course, that was no problem!

ncborcherding commented 2 years ago

The newest dev version (v1.3.5) now has a new parameter ssGSEA.norm which will scale the columns from 0 to 1. However, if you would like to subset the data at a later time, the default setting is now not to normalize the data.

enrichIt <- function(obj, gene.sets = NULL, 
                     method = "ssGSEA", 
                     groups = 1000, 
                     cores = 2,
                     min.size = 5,
                     ssGSEA.norm = FALSE) 

Thanks again for letting me know about this.

Nick