Closed rythi4 closed 2 years ago
Hey rythi4,
Would you mind sharing an example?
If you are using a single-cell object to run enrichIt()
, the function is pulling the raw counts from the "RNA" assay and calculates the enrichment score using a Poisson distribution in GSVA, and performs the ssGSEA normalization described by Barbie et al 2009. This normalization uses the absolute difference between a given gene set - so it is occurring on a gene.set-level.
So if the composition of the seurat object changed, in terms of the cell number, there might be a difference in the output values. But if it is the same Seurat object, it should result in the same values.
Nick
Here is an example:
`
set1 <- GeneSetCollection(GeneSet(genes[["X"]][["gene"]], setName = "x"), GeneSet(genes[["Y"]][["gene"]], setName = "y"), GeneSet(genes[["Z"]][["gene"]], setName = "z"), GeneSet(genes[["W"]][["gene"]], setName = "w")) set2 <- GeneSetCollection(GeneSet(genes[["X"]][["gene"]], setName = "x"), GeneSet(genes[["Y"]][["gene"]], setName = "y"))
DefaultAssay(seurat_object) <- "RNA" seurat_object_1 <- UpdateSeuratObject(seurat_object) ES.seurat_1 <- enrichIt(obj = seurat_object_1, gene.sets = set1, groups = 1000)
DefaultAssay(seurat_object) <- "RNA" seurat_object_1 <- UpdateSeuratObject(seurat_object) ES.seurat_2 <- enrichIt(obj = seurat_object_1, gene.sets = set2, groups = 1000) ` The two seurat_objects are defined by the exact same line of code.
When I look at ES.seurat_1 and ES.seurat_2 (I order the columns 'x' by highest value for easy comparison), I see that the highest value is not the same:
Thank you so much for the reproducible example!!
I found the issue - it lies in the GSVA package itself
I think this might be an error in my reading of the original manuscript for ssGSEA. I will take a look, I might have to implement an intrinsic normalization so the enrichment scores are deterministic. I'll post here when I get something working.
Thanks for bringing this to my attention.
Nick
Of course, that was no problem!
The newest dev version (v1.3.5) now has a new parameter ssGSEA.norm which will scale the columns from 0 to 1. However, if you would like to subset the data at a later time, the default setting is now not to normalize the data.
enrichIt <- function(obj, gene.sets = NULL,
method = "ssGSEA",
groups = 1000,
cores = 2,
min.size = 5,
ssGSEA.norm = FALSE)
Thanks again for letting me know about this.
Nick
I have a question regarding the output of enrichIt(). This type of analysis is new to me, so maybe the question is quite simple: I tried to run enrichIt on a seurat object (13 clusters) with two genesets and then afterwards with only one of the gene sets. When I look at the enrichment scores for the geneset I tried twice, they are not the same (although the overall picture is the same), and I am not sure why? Does some sort of normalization over the genesets take place when running enrichIt() and can it explain the difference I see?