philres / ngmlr

NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations
MIT License
289 stars 40 forks source link

NGMLR very slow on bovine nanopore reads #70

Open sdjebali opened 4 years ago

sdjebali commented 4 years ago

Dear all,

First of all, thanks for this very nice development.

I just wanted to report the fact that on some quite heavy ONT runs from bovine, NGMLR followed by sort was very slow (about 4 days for 4 million reads).

And I was wondering if I was using the tool correctly (right parameters)?

I tried with the first 1 million reads like this: zcat $fastq | head -n 4000000 | ngmlr --presets ont -t 22 -r $genome | samtools sort -@ 6 -o $output and it took 5h23 to complete

I then tried with the second 1 million reads like this: zcat $fastq | tail -n+4000000 | head -n 4000000 | ngmlr --presets ont -t 22 -r $genome | samtools sort -@ 4 -o $output and it took 24h10 to complete

I am using NGMLR version 0.2.8 and samtools version 1.9, and here are the details about my machine : Linux tatum 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5+deb10u2 (2019-08-08) x86_64 GNU/ 24 processors Linuxprocessor : 0 vendor_id : GenuineIntel cpu family : 6 model : 45 model name : Intel(R) Xeon(R) CPU E5-4610 0 @ 2.40GHz

Any advice would be warmly welcome?

Best, Sarah

fritzsedlazeck commented 4 years ago

Thanks Sarah, do you have an average read length? Its likely but unfortunate that some of your 2nd patch reads are very long.. Thanks Fritz

sdjebali commented 4 years ago

Indeed there seems to be a big read length difference between the two batches.

I ran Nanoplot on them and here are the results :

so 13kb vs 4kb

If we still want to use NGMLR on these data, is there any option that can speed the process up?

Best, Sarah