Closed kiwipeel closed 4 months ago
Hi @kiwipeel,
This is a general problem with using prior knowledge databases such as PROGENy. Indeed, there is a trade-off between gene coverage and confidence in the prior. In PROGENy, we select the top N genes based to try to have high confidence predictions, but of course if the number of genes that overlap is too low it will be erroneous. Base don this, one can play with the N parameter but there is no perfect number I can tell you. In your case, that 300 genes already match is a good coverage (more than 50%), so I wouldn't worry. Hope this is helpful!
How does the matching quantity of expressed genes within each cell cluster to the genes responsible for pathways in the progeny data affect the results during scRNA-seq pathway analysis? For example, when I select the top 500 genes, only 300 of the genes within cluster 1 match with the genes in the TNFa pathway. How does the absence of the remaining genes in my data affect the outcome?