nf-core / pangenome

Renders a collection of sequences into a pangenome graph.
https://nf-co.re/pangenome
MIT License
63 stars 16 forks source link

Perform sequence partitioning for massive parallelization #97

Closed subwaystation closed 1 year ago

subwaystation commented 1 year ago

Is your feature request related to a problem? Please describe

We don't exploit the possible parallelization of the pipeline when applying sequence partitioning to the input FASTA. A bash implementation can be found at https://github.com/pangenome/pggb/pull/243/files.

Describe the solution you'd like

I want to partition the input sequences up front so that we can run all the graph building steps at the moment for each partition in parallel. This means we want to generate one FASTQ report per partition!

Describe alternatives you've considered

PGGB can't be run in parallel across the partitions. At least not in general on all HPCs.

Additional context

MOVE!

subwaystation commented 1 year ago

This is actually already in for quite some time.