iRNA-COSI / APAeval

Community effort to evaluate computational methods for the detection and quantification of poly(A) sites and estimating their differential usage across RNA-seq samples
MIT License
13 stars 14 forks source link

feat(Execution workflow): LABRAT feature to reuse previous makeTFfasta output #406

Open mrgazzara opened 2 years ago

mrgazzara commented 2 years ago

Labrat takes very long time to run in its current implementation of the execution workflow. This is because the makeTFfasta step takes several days (~3.5 when I ran it). The subsequent Salmon steps take next to no time to run. It would be a useful feature to allow users to pass to the execution workflow the result of the makeTFfasta step from a previous execution to speed things up. The makeTFfasta step should only have to be run once per annotation version.

Any thoughts on this possibility @yuukiiwa or @dominikburri ?

dominikburri commented 2 years ago

Hi @mrgazzara, good point. One quick fix could be to run several samples at once, e.g. having the sample table with all Mayr samples. In this way the makeTFfasta should only be executed once. Or is this how you executed it so far? Alternatively, I can imagine to restructure nextflow in that makeTFfasta would be treated as a parameter and only executed/created when not existing. Similar to what @faricazjj did in IsoSCM, see here: https://github.com/iRNA-COSI/APAeval/blob/0152d1dbe0ab8176f75d3a995d1f2a3b80b50019/execution_workflows/IsoSCM/conf/modules.config#L28

mrgazzara commented 2 years ago

Yes the latter is exactly what I had in mind. We also do something very similar with the QAPA EWF where you can chose to build a new 3'UTR annotation file or re-use an existing annotation. Something like that would be a huge time saver for users of our Labrat EWF.