Community effort to evaluate computational methods for the detection and quantification of poly(A) sites and estimating their differential usage across RNA-seq samples
MIT License
13
stars
14
forks
source link
Bug: GETUTR file not found error in postprocessing step when using huge sample file #439
Background
When running GETUTR with a huge sample file, postprocess step will encounter a file not found error from DaPars. An example of the error message is below:
This issue is however never encountered when using our test files that are small.
Problem
After investigating the workflow, the issue seems to be that getutr_process.nf publishes the file from GETUTR to a location in the output directory. The postprocessing step then reads from the file published to the output directory. When the output is big, the file publishing would take a little longer, but the workflow would already move on to the next process which is postprocessing step. Hence, the file is not found because it wasn't done being created.
Solution
One way to solve this is to avoid reading from a published file because that would take some time. Instead, we could read the file from the output channel of the previous process.
This solution works and the workflow finishes without error:
Checklist
[x] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have updated corresponding READMEs (if applicable)
[x] My code follows the templates/style guidelines of the repository
[x] In- and output formats comply with APAeval specifications
[x] No parameters or file names are hardcoded
[x] Results, logs or other output is not commited to the repository
Fixes #438
Background When running GETUTR with a huge sample file, postprocess step will encounter a file not found error from DaPars. An example of the error message is below:
This issue is however never encountered when using our test files that are small.
Problem After investigating the workflow, the issue seems to be that getutr_process.nf publishes the file from GETUTR to a location in the output directory. The postprocessing step then reads from the file published to the output directory. When the output is big, the file publishing would take a little longer, but the workflow would already move on to the next process which is postprocessing step. Hence, the file is not found because it wasn't done being created.
Solution One way to solve this is to avoid reading from a published file because that would take some time. Instead, we could read the file from the output channel of the previous process.
This solution works and the workflow finishes without error:
Checklist