pangenome / pggb

the pangenome graph builder
https://doi.org/10.1038/s41592-024-02430-3
MIT License
381 stars 44 forks source link

wfmash to speed up #403

Closed OZTaekOppa closed 2 weeks ago

OZTaekOppa commented 3 months ago

Dear PGGB team,

I have a few questions about your great program.

Based on the program, it appears that wfmash performs all-vs-all alignment on a single node. From my trials, this is indeed the case.

I am trying to speed up the wfmash process on multiple nodes (PBSpro) by running parallel jobs. My idea is to perform one-vs-all alignments for each node from an input full genome dataset (120 human pangenomes), and then merge the results into a single paf file for further analysis.

  1. Do you have any recommendations for tweaking the wfmash code to achieve this?
  2. If I run one-vs-all alignments on each node, will the merged paf file be equivalent to an all-vs-all alignment? Theoretically, I assume the final outcome should be the same.

Looking forward to your insights.

Kind regards,

Taek

subwaystation commented 3 months ago

Dear @OZTaekOppa,

we have a short description on how to parallelize wfmash using several nodes. Please take a look at https://github.com/waveygang/wfmash?tab=readme-ov-file#running-wfmash-on-a-cluster.