philres / ngmlr

NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations
MIT License
284 stars 41 forks source link

ngmlr stops using cpu and runs infinitely #109

Open snm32 opened 3 months ago

snm32 commented 3 months ago

ngmlr stops using cpu and runs infinitely

Hi, ngmlr is not working properly for me. It continues running but its used number of threads goes down to zero within minutes after starting the command (manually checked with htop). The reads/per second value is also declining. After no more threads are used by ngmlr it is stuck at the same no. of processed reads. I kept a run running for more than eight days. Nothing changed. I first noticed it when running read2tree which itself uses ngmlr. I was not able to narrow down the cause of this error. So maybe you can help me.

run details

input reads either one of them was used for different runs ``` PacBio: SRR25517175 or SRR25517176 ONT: SRR14118388 ```
input reference fasta file with 200 reference genes (nucleotide sequences), generated by read2tree based on data manually exported from the OMA browser. stats (in bp): ``` average sequence length: 1348 minimal sequence length: 207 maximal sequence length: 4683 N25: 2310 N50: 1722 N75: 1116 N90: 759 ```
hardware -Two VMs are provided by [de.NBI cloud](https://www.denbi.de/cloud) - virtual machine with 28 VCPUs and 64 GB ram and Ubuntu 22.04 LTS de.NBI (2023-09-29) - virtual machine with 28 VCPUs and 240 GB ram and Ubuntu 20.04 LTS de.NBI (2022-10-28) - Private desktop PC with 12 cores and 16 GB ram and Ubuntu 22.04.4 LTS
software - ngmlr version 0.2.8 (builded from source) - ngmlr version 0.2.7 (via conda and from your prebuilded binary as well as from the read2tree docker iamge, but that was just used by read2tree) - ngmlr version 0.2.6 (via conda and from your docker)
flags I used the following flags for every run (with ont or pacbio adjusted accordingly as well as the file paths): ```console -t 27 -R 0.25 --subread-length 256 -x ont -r /path-to/MANES_OGs_1.fa -q /path-to/SRR14118388.fastq -o /path-to/MANES_OGs_1.fa.sam # sometimes I did not use the "-x" flag (when working with PacBio data) # sometimes i appended the following to safe stdout (but it was just written to the err_file that should be for stderr and not stdout?) >> /path-to/log_file_1 2>> /path-to/err_file_1 # I also tried the --verbose flag, but the output was not containing things that I would found suspicious (and the logfile containing the output was far more bigger than the output .sam file) ```

Questions:

  1. Is it a known issue with an easy solution?
  2. Should I pretreat my input?
  3. Am I missing dependencies?
  4. Is something else hindering the program from running correctly? (e.g. for orthofinder I sometimes have to raise the limits of simultaneously opened files)
  5. Can I help to narrow down the cause/steps to reproduce to help you improve future software version?