GeneSet and GeneSetCollection

ncborcherding / escape

Easy single cell analysis platform for enrichment

MIT License

131 stars 16 forks source link

GeneSet and GeneSetCollection #18

Closed duocang closed 3 years ago

duocang commented 3 years ago

Hi！

Can you please help me why the enrichment is different in Geneset A?

gs.A <- GeneSet( c("gene.A1", "gene.A2“, ”gene.A3", ......), setName="A")
gs.B <- GeneSet( c("gene.B1", "gene.B2“, ”gene.B3", ......), setName="B")

es.A <- enrichIt(obj=Seurat.obj, gene.sets = gs.A)
es.B <- enrichIt(obj=Seurat.obj, gene.sets = gs.A)

head(es.A)
>
               A
cell.1    -0.34535
cell.2     0.88888
cell.3     0.66354
cell.4     0.43214
cell.5     0.42526

gsc <- GeneSetCollection(gs.A, gs.B)
es <- enrichIt(obj=Seurat.obj, gene.sets = gsc)
head(es)
>
               A           B
cell.1     0.85475        xxxx
cell.2     0.04346        xxxx
cell.3    -0.21353        xxxx
cell.4     0.98514        xxxx
cell.5     0.02526        xxxx

And how do I understand the value above? Does -0.21353 mean less enriched in Set A while 0.85475 means more enriched in Set A?

ncborcherding commented 3 years ago

Hey Duocang,

The output of enrichIt() is a normalized enrichment score, so yes you are interpreting it correctly:

-0.21353 mean less enriched in Set A while 0.85475 means more enriched in Set A

I think the power comes from the analysis of groups/clusters of single cells though as there is a high degree of variability in cell counts of single-cell data even within clusters.

Let me know if you have any more question.

Thanks, Nick

duocang commented 3 years ago

Hi Nick @ncborcherding .

Thank you for the quick response.

It is true that there is a high degree of variability in sc data. Concerning the different results of enrichment scores when using enrichIt in different manners, which one do you think I should trust more?

My goal is to check whether a cell is enriched in two gene groups/sets (gene.A1, gene.A2, gene.A3, ...... and gene.B1, gene.B2, gene.B3, ......). If not, I will dump the cell.

es.A <- enrichIt(obj , gene.sets = gs.A)
es.B <- enrichIt(obj , gene.sets = gs.A)

es <- enrichIt(obj,  GeneSetCollection(gs.A, gs.B) )

ncborcherding commented 3 years ago

Hey duocang,

Both should work in terms of getting the enrichment

es.A <- enrichIt(obj , gene.sets = gs.A)
es.B <- enrichIt(obj , gene.sets = gs.A)

es <- enrichIt(obj,  GeneSetCollection(gs.A, gs.B) )

Option #2 will give you an output with both gene sets that you can then attach (seurat::addMetaData()) and use as filters (seurat::subset), which might be more straight forward.

The one thing I would check is before I implement my subset() is if you filter strategy is preferentially dropping cells with low expression (nfeature). This is not nessecarily the case in all instances, but might introduce bias in downstream analyses.

Hope that makes sense and let me know if you have any other questions. I am going to close this comment as it is not an issue with the code of escape, but I think it is a helpful discussion for other users. So please keep asking questions if you have them.

Thanks, Nick

duocang commented 3 years ago

Hi Nick.

Sorry if I asked the question again.

The ES scores I got with the above methods were different. I feel insecure because of the difference. Maybe the values were normalized when gene sets are more than one?

ncborcherding commented 3 years ago

Hey duocang,

Can you give me an example of your outputs?

Thanks, Nick