ncborcherding / escape

Easy single cell analysis platform for enrichment
https://www.borch.dev/uploads/screpertoire/articles/running_escape
MIT License
143 stars 20 forks source link

genesets from msigdb #115

Closed alicekao1118 closed 1 month ago

alicekao1118 commented 1 month ago

Hi Nick,

Thanks for developing this useful tool! I have a question about the database I pulled from Msigdb database. I tried to get

gene.set.1 <- getGeneSets(species="Homo sapiens", library=c("C2"), subcategory = c("CP") )

But the output is only 29 pathways, which is differnet from the website's 3917. https://www.gsea-msigdb.org/gsea/msigdb/collections.jsp

Also, I'm wondering if there's a way to get down further into the smaller subset, like PID subset of CP, KEGG_LEGACY subset of CP.

I'm using escape v2.0.0

Thanks, Alice

ncborcherding commented 1 month ago

Hey Alice,

Thanks for reaching out - this is a confusing point and I need to improve the documentation. There is an argument in escpae.matrix() and runEscape() min.size = 5. This means it will automatically remove gene sets that do not have at least 5 genes in the input data. I would not have guessed there would be a order of magnitude reduction though - I would make sure your gene symbols are matching between the gene.set.1 and the single-cell object.

For the second question - this is more dictated by how msigDBR stores data. You can see if it possible. from the above package.

Thanks for reaching out, Nick