saezlab / decoupleR

R package to infer biological activities from omics data using a collection of methods.
https://saezlab.github.io/decoupleR/
GNU General Public License v3.0
176 stars 23 forks source link

progeny vs decoupleR packages: which scores are trustworthy? #91

Closed ismailelshimy closed 11 months ago

ismailelshimy commented 1 year ago

Dear decoupleRs @mschubert @PauBadiaM ,

I have a question for you? I scored progeny pathway activities in my scRNA-seq 10X data set (from nasal epithelium) using the progeny scoring function from the progeny package and the scoring methods in the decoupleR package. I noticed a big discrepancy between the progeny function results and the decoupleR results. I do not know which results I should trust? Especially noticing that the progeny function scores are completely uncorrelated to the scores from the decoupleR methods: ULM, MLM, WMEAN, WSUM as you see in the following density plot.

Can you tell me which scores are more trustworthy: the progeny function from the progeny package or the decoupler scoring functions ?

pathway_scoring_comparison_all

expecially because now I see that in the progeny package vignette, it is clearly stated that one should use decoupleR to infer pathway activities. Does this mean that progeny function is obsolete? and why?

Thank you very much.

P.S. I am not a statistician by training, so any recommendations and guidance on the most appropriate statistical tool is greatly appreciated.

PauBadiaM commented 11 months ago

Hi @ismailelshimy,

to infer pathway activity you need two things, an enrichment method and a "gene set" resource. The old code of progeny implemented the method wsum(unormalized) with the progeny pathway-to-gene database. decoupleR is the refactor of our tools progeny and also dorothea, to allow more flexibility regarding which enrichment method and which gene set resource to use (the name decoupleR comes from the fact that we "decouple" the enrichment method from the gene set database).

In the decoupleR paper, we show that the unormalized wsum does not perform that well and therefore we recommend the use of linear model based methods such as ulm or mlm. So by using a different enrichment method it is expected that the results will change.

Another thing to consider is that for the progeny database you need to specify the number of top genes ranked by p-value to use (the top parameter). We have observed that using different values also changes the result, specially if selecting too many genes. Therefore, the use of different top values will also change the result. This is something that we are currently exploring, for now use the defaults as shown in the decoupleR vignettes.

Hope this is helpful!