Closed ismailelshimy closed 11 months ago
Hi @ismailelshimy,
to infer pathway activity you need two things, an enrichment method and a "gene set" resource. The old code of progeny
implemented the method wsum
(unormalized) with the progeny
pathway-to-gene database. decoupleR
is the refactor of our tools progeny
and also dorothea
, to allow more flexibility regarding which enrichment method and which gene set resource to use (the name decoupleR
comes from the fact that we "decouple" the enrichment method from the gene set database).
In the decoupleR
paper, we show that the unormalized wsum
does not perform that well and therefore we recommend the use of linear model based methods such as ulm
or mlm
. So by using a different enrichment method it is expected that the results will change.
Another thing to consider is that for the progeny
database you need to specify the number of top genes ranked by p-value to use (the top
parameter). We have observed that using different values also changes the result, specially if selecting too many genes. Therefore, the use of different top
values will also change the result. This is something that we are currently exploring, for now use the defaults as shown in the decoupleR
vignettes.
Hope this is helpful!
Dear decoupleRs @mschubert @PauBadiaM ,
I have a question for you? I scored progeny pathway activities in my scRNA-seq 10X data set (from nasal epithelium) using the progeny scoring function from the progeny package and the scoring methods in the decoupleR package. I noticed a big discrepancy between the progeny function results and the decoupleR results. I do not know which results I should trust? Especially noticing that the progeny function scores are completely uncorrelated to the scores from the decoupleR methods: ULM, MLM, WMEAN, WSUM as you see in the following density plot.
Can you tell me which scores are more trustworthy: the progeny function from the progeny package or the decoupler scoring functions ?
expecially because now I see that in the progeny package vignette, it is clearly stated that one should use decoupleR to infer pathway activities. Does this mean that progeny function is obsolete? and why?
Thank you very much.
P.S. I am not a statistician by training, so any recommendations and guidance on the most appropriate statistical tool is greatly appreciated.