hms-dbmi / dseqr

single-cell and bulk RNA-seq analyses from counts → pathways → drug candidates.
https://docs.dseqr.com
Other
20 stars 4 forks source link

derive pval/fdr more efficiently #118

Closed alexvpickering closed 5 years ago

alexvpickering commented 5 years ago

PADOG derives pvalues and FDRs through permutation and re-analysis. It is very CPU/RAM/time costly (too much so for a web app). To get just the relative rankings from PADOG is fast and is the main advantage over other pathway analysis methods study1, study2. Other methods are good at deriving accurate pvalues so it is worthwhile exploring them just for pvalue derivation.

alexvpickering commented 5 years ago

After digging into this one I realized that the PADOG pathway rankings are from permutation-based pvals. As such, it's not possible to recover the pathway rankings without deriving pvals. As a compromise, I've reduced the number of iterations in the permutation process from 1000 down to 24. This substantially speeds things up and the resulting pathway rankings had a Spearman correlation of >0.95 with an equivalent run with 1000 iterations (based on single study). For now I think this is a reasonable approach.

Another future possibility would be to initially show the results based on few iterations for quick feedback and then in the background run a much larger number of iterations and update the results when they are available.