philres / ngmlr

NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations
MIT License
284 stars 41 forks source link

PacBio: NGMLR slows from 110 reads-mapped/s to ~1 reads-mapped/s on last million reads #69

Open matthale16 opened 4 years ago

matthale16 commented 4 years ago

Did high coverage PacBio sequencing of a human donor (about 8.5 million reads). Trying to use NGMLR to map to GRCh38.p13 Running on a system with an i9-9900K (8 core) and 32gb of ram with the files on the M.2 disk. Everything starts out great with ~110 reads mapped per second for the first ~7.9 million reads (~20 hours in) and then suddenly it slows to about 1 read mapped per second. All 8 cores are still running at 100%, there's 17gb of ram free, and 436gb of disk space available so I don't think it is due to a lack of system resources. This is the second time trying to run this and getting the exact same result --- slows to the point where it will probably not complete.

Any idea what the problem might be? If we want to cut our losses, is there a way to have NGMLR stop trying to align new reads and spit out a usable BAM file? Any help would be greatly appreciated. Thanks, -Matt