saezlab / decoupleR

R package to infer biological activities from omics data using a collection of methods.
https://saezlab.github.io/decoupleR/
GNU General Public License v3.0
176 stars 23 forks source link

Number of matched genes in pathway analysis #113

Closed kiwipeel closed 4 months ago

kiwipeel commented 5 months ago

How does the matching quantity of expressed genes within each cell cluster to the genes responsible for pathways in the progeny data affect the results during scRNA-seq pathway analysis? For example, when I select the top 500 genes, only 300 of the genes within cluster 1 match with the genes in the TNFa pathway. How does the absence of the remaining genes in my data affect the outcome?

PauBadiaM commented 5 months ago

Hi @kiwipeel,

This is a general problem with using prior knowledge databases such as PROGENy. Indeed, there is a trade-off between gene coverage and confidence in the prior. In PROGENy, we select the top N genes based to try to have high confidence predictions, but of course if the number of genes that overlap is too low it will be erroneous. Base don this, one can play with the N parameter but there is no perfect number I can tell you. In your case, that 300 genes already match is a good coverage (more than 50%), so I wouldn't worry. Hope this is helpful!