nf-core / genomeqc

Compare the quality of multiple genomes, along with their annotations.
https://nf-co.re/genomeqc
MIT License
3 stars 8 forks source link

Add NCBI FCS GX module #41

Open FernandoDuarteF opened 1 month ago

FernandoDuarteF commented 1 month ago

Add nf-core module to both genome only and genome_and_annotation subworkflows.

GallVp commented 3 weeks ago

The nf-core module checks one assembly at a time. The issue with that approach is that most of the time is spent on loading the database (500 GB) in memory. This does not scale well.

I have solved the above issue in assemblyqc with a custom module: https://github.com/Plant-Food-Research-Open/assemblyqc/blob/cec2728d0785d97e0e9493558c83d83d0b48240e/modules/local/ncbi_fcs_gx_screen_samples.nf#L1

In assemblyqc, instead of processing one assembly per task, I create a batch and then process them in a single task. I am keen to discuss this idea further and see how we can make the module reusable across pipelines.