Open FernandoDuarteF opened 1 month ago
The nf-core module checks one assembly at a time. The issue with that approach is that most of the time is spent on loading the database (500 GB) in memory. This does not scale well.
I have solved the above issue in assemblyqc
with a custom module: https://github.com/Plant-Food-Research-Open/assemblyqc/blob/cec2728d0785d97e0e9493558c83d83d0b48240e/modules/local/ncbi_fcs_gx_screen_samples.nf#L1
In assemblyqc
, instead of processing one assembly per task, I create a batch and then process them in a single task. I am keen to discuss this idea further and see how we can make the module reusable across pipelines.
Add nf-core module to both
genome
only andgenome_and_annotation
subworkflows.