knights-lab / SHOGUN

SHallow shOtGUN profiler
GNU Affero General Public License v3.0
54 stars 19 forks source link

SHOGUN fails silently when bowtie2 runs out of memory #42

Open ElDeveloper opened 2 years ago

ElDeveloper commented 2 years ago

Working through a dataset, I found that most of the resulting alignments only included 100K-200K sequence identifiers from the input dataset even though most of my samples have >1M sequences. Unsure of what was going on, I tried running bowtie2 manually (according to the command call here). That's when I noticed my OS was killing bowtie2 with signal 9:

bowtie2 --no-unal -x /[redacted]/shogun-db/bt2/rep82 -S [redacted].sam --np 1 --mp "1,1" --rdg "0,1" --rfg "0,1" --score-min '"L,0,-0.02"' -f [redacted].fna --very-sensitive -k 16 -p 16 --reorder --no-hd
(ERR): bowtie2-align died with signal 9 (KILL)

After this happened, I checked the exit code (using echo $?) and saw error code 1. As best as I can tell there's nowhere in the SHOGUN code that checks for the exit code of bowtie2. While it is being returned here:

https://github.com/knights-lab/SHOGUN/blob/24109b719463e7797af116b819e1adf89e38815f/shogun/aligners/bowtie2_aligner.py#L32-L38

There's no checks for it in align method calls:

https://github.com/knights-lab/SHOGUN/blob/24109b719463e7797af116b819e1adf89e38815f/shogun/__main__.py#L75

https://github.com/knights-lab/SHOGUN/blob/24109b719463e7797af116b819e1adf89e38815f/shogun/__main__.py#L78

The worse thing about this error is that since SHOGUN won't fail or catch this error, you can successfully process a dataset and generate incomplete contingency tables. The resulting SAM file is written to disk but it obviously incomplete, unfortunately shogun assign_taxonomy doesn't know this so it just processes the dataset as expected.


In my case running on a 32GB system my samples were missing around 60-80% of their reads.

bhillmann commented 2 years ago

Good call and a thorough investigation. This is indeed a nightmare situation where there is a silent bug. We should open up a PR and handle exit codes from the aligners.