maickrau / GraphAligner

MIT License
261 stars 32 forks source link

Unable to reproduce alignment speed in the GraphAligner publication #104

Open jzhang-dev opened 3 months ago

jzhang-dev commented 3 months ago

Hi,

I am trying to align ~40x ONT WGS reads for HG002 to DBG built from ~35x MGISEQ reads from the same sample. Based on published data, this would require <500 h CPU time (175h for 15x coverage PacBio reads for HG00733 in the read correction experiment in the GraphAligner publication). However, the actual alignment took 1404 h CPU time for when using -C 10000 and 5653 h CPU time when using -C 50000. Please see below for details:

image

The aligned fraction was 87.6% in the GraphAligner publication, which is higher than the results above.

I am writing GraphAligner v1.0.19 installed from bioconda on an Elastic Virtual Server with 232 vCPUs (2.45 GHz each) from Huawei Cloud.

Could you please share more information about the read correction experiment in the GraphAligner publication? In particular, what was the tangle effort parameter used in the experiment? What parameters were used to create the DBG from Illumina reads? These would be very helpful for understanding the observed difference in performance. If you have other ideas about this, Please do let me know. Thank you very much.

@maickrau