SHOGUN silently produces empty output alignment when BURST segfaults

tanaes commented 6 years ago

Hey guys,

We've been trying to track down a problem while adapting SHOGUN to Qiita, the symptom of which was finding this message when running integration tests in Travis:

+   File "/home/travis/build/qiita-spots/qp-shotgun/miniconda3/envs/qp-shotgun/lib/python3.5/site-packages/pandas/core/groupby.py", line 2934, in _get_grouper
+     raise KeyError(gpr)
+ KeyError: 'summary'

@antgonza also was having the same error on his OS X install, but neither I (on Barnacle) nor @semarpetrus (on his Linux box) were encountering it.

Running SHOGUN directly using the following commands yielded a good alignment + downstream files on Barnacle:

aln_out=foo.align
database=/home/jgsanders/git_sw/qp-shotgun/qp_shotgun/shogun/databases/shogun
level=species
aligner=burst
threads=8
profile=profile.tsv
aln_out_fp=foo.align/alignment.burst.b6
redistributed="profile.${level}.tsv"
fun_output=functional

shogun align \
--aligner ${aligner} \
--threads ${threads} \
--database ${database} \
--input combined.fna \
--output ${aln_out}

shogun assign_taxonomy \
--aligner ${aligner} \
--database ${database} \
--input ${aln_out_fp} \
--output ${profile}

shogun redistribute \
--database ${database} \
--level ${level} \
--input ${profile} \
--output ${redistributed}

fun_level=$level
shogun functional \
--database ${database} \
--input ${profile} \
--output ${fun_output} \
--level ${fun_level}

where the test database is here and the input data are here

Running the same align command on an OS X box (using Gabe's supplied burst15 binary) ran for a bit and then produced an empty .b6 output file.

Running BURST directly on the OS X box produced the following output:

burst15 --references qp_shotgun/shogun/databases/shogun/burst/5min.edx --queries combined.fna  --output test.b6 --accelerator qp_shotgun/shogun/databases/shogun/burst/5min.acx
This is BURST [v0.99.7LL]
 --> Using accelerator file qp_shotgun/shogun/databases/shogun/burst/5min.acx
Using up to AVX-128 with 8 threads.
 --> [Accel] Accelerator found. Parsing...
 --> [Accel] Total accelerants: 805949 [bytes = 2106932]
 --> [Accel] Reading 0 ambiguous entries

EDB database provided. Parsing...
 --> EDB: Fingerprints are DISABLED
 --> EDB: Parsing compressed headers
 --> EDB: Sheared database (shear size = 515)
 --> EDB: 970 refs [970 orig], 61 clumps, 1030 maxR
Parsed 400000 queries (0.071752). Calculating minMax...
Found min 150, max 150 (0.000109).
Converting queries... Converted (0.007549)
Copying queries... Copied (0.002561)
Sorting queries... Sorted (0.088294)
Copying indices... Copied (0.001531)
Determining uniqueness... Done (0.007544). Number unique: 397338
Collecting unique sequences... Done (0.001721)
Creating data structures... Done (0.004528) [maxED: 4]
Determining query ambiguity... Determined (0.023589)
Creating bins... Created (0.011927); Unambig: 391663, ambig: 5675, super-ambig: 0 [5675,397338,397338]
Re-sorting... Re-sorted (0.194431)
Calculating divergence... Calculated (0.009815) [10.120026 avg div; 150 max]
Fingerprints not enabled
Setting QBUNCH to 16
Using ACCELERATOR to align 397338 unique queries...
Search Progress: [100.00%]
Search complete. Consolidating results...
Segmentation fault: 11

What do you think?

GabeAl commented 6 years ago

Hi, thanks for the report!

Could you tell me a bit more about the target computer(s) this fails on? What is the memory (RAM) and CPU?

I'll look into this! Gabe

On Thu, Apr 19, 2018, 8:45 PM Jon Sanders notifications@github.com wrote:

Hey guys,

We've been trying to track down a problem while adapting SHOGUN to Qiita, the symptom of which was finding this message when running integration tests in Travis:

File "/home/travis/build/qiita-spots/qp-shotgun/miniconda3/envs/qp-shotgun/lib/python3.5/site-packages/pandas/core/groupby.py", line 2934, in _get_grouper

raise KeyError(gpr)

KeyError: 'summary'

@antgonza https://github.com/antgonza also was having the same error on his OS X install, but neither I (on Barnacle) nor @semarpetrus https://github.com/semarpetrus (on his Linux box) were encountering it.

Running SHOGUN directly using the following commands yielded a good alignment + downstream files on Barnacle:

aln_out=foo.align database=/home/jgsanders/git_sw/qp-shotgun/qp_shotgun/shogun/databases/shogun level=species aligner=burst threads=8 profile=profile.tsv aln_out_fp=foo.align/alignment.burst.b6 redistributed="profile.${level}.tsv" fun_output=functional

shogun align \ --aligner ${aligner} \ --threads ${threads} \ --database ${database} \ --input combined.fna \ --output ${aln_out}

shogun assign_taxonomy \ --aligner ${aligner} \ --database ${database} \ --input ${aln_out_fp} \ --output ${profile}

shogun redistribute \ --database ${database} \ --level ${level} \ --input ${profile} \ --output ${redistributed}

fun_level=$level shogun functional \ --database ${database} \ --input ${profile} \ --output ${fun_output} \ --level ${fun_level}

where the test database is here https://github.com/antgonza/qp-shotgun/blob/shogun/qp_shotgun/shogun/databases/shogun.tar.bz2 and the input data are here https://www.dropbox.com/s/ocu4c0ft8vhbjwx/combined.fna?dl=0

Running the same align command on an OS X box (using Gabe's supplied burst15 binary) ran for a bit and then produced an empty .b6 output file.

Running BURST directly on the OS X box produced the following output:

burst15 --references qp_shotgun/shogun/databases/shogun/burst/5min.edx --queries combined.fna --output test.b6 --accelerator qp_shotgun/shogun/databases/shogun/burst/5min.acx This is BURST [v0.99.7LL] --> Using accelerator file qp_shotgun/shogun/databases/shogun/burst/5min.acx Using up to AVX-128 with 8 threads. --> [Accel] Accelerator found. Parsing... --> [Accel] Total accelerants: 805949 [bytes = 2106932] --> [Accel] Reading 0 ambiguous entries

EDB database provided. Parsing... --> EDB: Fingerprints are DISABLED --> EDB: Parsing compressed headers --> EDB: Sheared database (shear size = 515) --> EDB: 970 refs [970 orig], 61 clumps, 1030 maxR Parsed 400000 queries (0.071752). Calculating minMax... Found min 150, max 150 (0.000109). Converting queries... Converted (0.007549) Copying queries... Copied (0.002561) Sorting queries... Sorted (0.088294) Copying indices... Copied (0.001531) Determining uniqueness... Done (0.007544). Number unique: 397338 Collecting unique sequences... Done (0.001721) Creating data structures... Done (0.004528) [maxED: 4] Determining query ambiguity... Determined (0.023589) Creating bins... Created (0.011927); Unambig: 391663, ambig: 5675, super-ambig: 0 [5675,397338,397338] Re-sorting... Re-sorted (0.194431) Calculating divergence... Calculated (0.009815) [10.120026 avg div; 150 max] Fingerprints not enabled Setting QBUNCH to 16 Using ACCELERATOR to align 397338 unique queries... Search Progress: [100.00%] Search complete. Consolidating results... Segmentation fault: 11

What do you think?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/knights-lab/SHOGUN/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/AHrXBvdct9NKb_Ie48fOmdPloFzcherFks5tqS-8gaJpZM4Tcs1z .

antgonza commented 6 years ago

In travis, we are get between 4 GB and 7.5 GB. Note that we are using Sudo-enabled builds more info.

Locally, I have a MacBookPro14,3, with 16 GB

bhillmann commented 6 years ago

@tanaes SHOGUN doesn't pick up the failed signal from BURST? Python's subprocess call should log it.

tanaes commented 6 years ago

Under default parameters, it gave no output to STDOUT or STDERR, just produced an empty alignment file.

GabeAl commented 6 years ago

What command was used to build the database? Also, does the attached linux binary (compiled from the same code used to compile the Mac binary) work on your high-RAM linux systems? Trying to rule out database creation commands as well as differences in code since the older existing linux version.

burst15.zip

I ran burst15 -r 5min.fna -a 5min.acx -o 5min.edx -d DNA -s

Then aligned with burst15 -r 5min.edx -a 5min.acx -q combined.fna -o test.b6

According to my run with /usr/bin/time -v, this took 12GB of RAM to run. Insufficient RAM might then explain the travis failure, but it's unclear what's causing the Mac failure (unless you had over 4GB consumed by other programs at runtime, leaving less than 12GB for burst15).

BURST15 will always reserve ~8GB (the size of the index table in the "database15" mode, adjusted for number of threads) plus the size of the database itself (minimum 4GB), so it'll yank 12GB to run (burst12 can run in under 128MB so that's the one recommended for laptops!).

antgonza commented 6 years ago

Thanks! I'll let @tanaes answer those specific questions. Just out of curiosity, will 15/12 yield the same results? Either way, what are the differences?

tanaes commented 6 years ago

@GabeAl The attached binary does indeed segfault on our high memory linux machine. Here's the output (here, the ./burst15 is the one attached above):

☕  barnacle:qp-shotgun $ ./burst15 \
> --references qp_shotgun/shogun/databases/shogun/burst/5min.edx \
> --queries combined.fna  \
> --output test.b6 \
> --accelerator qp_shotgun/shogun/databases/shogun/burst/5min.acx
This is BURST [v0.99.7LL]
 --> Using accelerator file qp_shotgun/shogun/databases/shogun/burst/5min.acx
Using up to AVX-128 with 24 threads.
 --> [Accel] Accelerator found. Parsing...

 --> [Accel] Total accelerants: 805949 [bytes = 2106932]
 --> [Accel] Reading 0 ambiguous entries

EDB database provided. Parsing...
 --> EDB: Fingerprints are DISABLED
 --> EDB: Parsing compressed headers
 --> EDB: Sheared database (shear size = 515)
 --> EDB: 970 refs [970 orig], 61 clumps, 1030 maxR
Parsed 400000 queries (0.089528). Calculating minMax... 
Found min 150, max 150 (0.000125).
Converting queries... Converted (0.007726)
Copying queries... Copied (0.004054)
Sorting queries... Sorted (0.125254)
Copying indices... Copied (0.000616)
Determining uniqueness... Done (0.004894). Number unique: 397338
Collecting unique sequences... Done (0.001327)
Creating data structures... Done (0.006473) [maxED: 4]
Determining query ambiguity... Determined (0.012322)
Creating bins... Created (0.012095); Unambig: 391663, ambig: 5675, super-ambig: 0 [5675,397338,397338]
Re-sorting... Re-sorted (0.322825)
Calculating divergence... Calculated (0.007467) [10.120026 avg div; 150 max]
Fingerprints not enabled
Setting QBUNCH to 16
Using ACCELERATOR to align 397338 unique queries...
Search Progress: [100.00%]
Search complete. Consolidating results...
Segmentation fault (core dumped)
☕  barnacle:qp-shotgun $ ls
burst15  combined.fna  LICENSE  qp_shotgun  README.rst  scripts  setup.py  support_files  test  test.b6
☕  barnacle:qp-shotgun $ ~/miniconda/envs/oecophylla-shogun/bin/burst15 \
> --references qp_shotgun/shogun/databases/shogun/burst/5min.edx \
> --queries combined.fna  \
> --output test.b6 \
> --accelerator qp_shotgun/shogun/databases/shogun/burst/5min.acx
This is BURST [v0.99.7f]
 --> Using accelerator file qp_shotgun/shogun/databases/shogun/burst/5min.acx
Using up to AVX-128 with 24 threads.
 --> [Accel] Accelerator found. Parsing...
 --> [Accel] Total accelerants: 805949 [bytes = 2106932]
 --> [Accel] Reading 0 ambiguous entries

EDB database provided. Parsing...
 --> EDB: Fingerprints are DISABLED
 --> EDB: Parsing compressed headers
 --> EDB: Sheared database (shear size = 515)
 --> EDB: 970 refs [970 orig], 61 clumps, 1030 maxR
Parsed 400000 queries (0.085349). Calculating minMax... 
Found min 150, max 150 (0.000108).
Converting queries... Converted (0.007505)
Copying queries... Copied (0.004179)
Sorting queries... Sorted (0.131057)
Copying indices... Copied (0.006557)
Determining uniqueness... Done (0.006628). Number unique: 397338
Collecting unique sequences... Done (0.005024)
Creating data structures... Done (0.007195) [maxED: 4]
Determining query ambiguity... Determined (0.018151)
Creating bins... Created (0.016560); Unambig: 391663, ambig: 5675, super-ambig: 0 [5675,397338,397338]
Re-sorting... Re-sorted (0.340644)
Calculating divergence... Calculated (0.007354) [10.120026 avg div; 150 max]
Fingerprints not enabled
Setting QBUNCH to 16
Using ACCELERATOR to align 397338 unique queries...
Search Progress: [100.00%]
Search complete. Consolidating results...
CAPITALIST: Processed 329 investments

Alignment time: 42.566155 seconds

What's the difference, again, between burst12 and burst15? Does the database need to be reindexed for one vs the other?

GabeAl commented 6 years ago

This is indeed interesting. Could you share the commandline that was used to make the burst database? It seems to differ from what I used here: burst15 -r 5min.fna -a 5min.acx -o 5min.edx -d DNA -s

In any case, there may be a combination bug that arises from some mix of DB commandline and the most recent changes to CAPITALIST (and/or tallying reads in general).

A couple questions to help me hone in:

Does it crash if you use "-m BEST" ?
What commandline was used to generate the burst database (.acx and .edx)?

GabeAl commented 6 years ago

As for the difference between burst12 and burst15, burst12 is primarly intended for amplicon databases. It uses a much more RAM-friendly indexing scheme for small databases. For large (>4GB) databases, burst15 is recommended for speed.

As such, while the "edx" will work fine between the two versions, the "acx" is specific to one or the other (whichever version was used to make it).

tanaes commented 6 years ago

Awesome, thanks for the clarification. I’ll try remaking the database and see how it goes. On Mon, Apr 23, 2018 at 12:35 PM Gabriel Al-Ghalith < notifications@github.com> wrote:

As for the difference between burst12 and burst15, burst12 is primarly intended for amplicon databases. It uses a much more RAM-friendly indexing scheme for small databases. For large (>4GB) databases, burst15 is recommended for speed.

As such, while the "edx" will work fine between the two versions, the "acx" is specific to one or the other (whichever version was used to make it).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/knights-lab/SHOGUN/issues/18#issuecomment-383696043, or mute the thread https://github.com/notifications/unsubscribe-auth/AH6JAPuUnEBqGIqIAi-EKk-kK7o4wVC7ks5trizigaJpZM4Tcs1z .

bhillmann commented 6 years ago

Were you able to solve your problem by rebuilding the database?

H2CO3 commented 4 years ago

I ran into a similar issue. I wasn't able to get SHOGUN working with burst, since the latest official release of burst, v0.99.8, didn't even compile on my Linux machine (the source release contains syntax errors!).

So I installed bowtie2 and I ran SHOGUN with --aligner bowtie2. It kept crunching for about 18 minutes (htop was showing that the bowtie2 process was running), then I got the KeyError: 'summary' exception from Python. I don't know if bowtie2 segfaulted though.

GabeAl commented 4 years ago

The source likely doesn't contain syntax errors, it just requires the Intel compiler and architecture-specific optimization flags because of the assembly instructions included.

It is highly, highly recommended to grab the prebuilt binary for BURST from the Releases section of the repo.

Thanks, Gabe

On Wed, Apr 29, 2020 at 4:22 AM Árpád Goretity  notifications@github.com wrote:

I ran into a similar issue. I wasn't able to get SHOGUN working with burst, since the latest official release of burst, v0.99.8, didn't even compile on my Linux machine (the source release contains syntax errors!).

So I installed bowtie2 and I ran SHOGUN with --aligner bowtie2. It kept crunching for about 18 minutes (htop was showing that the bowtie2 process was running), then I got the KeyError: 'summary' exception from Python. I don't know if bowtie2 segfaulted though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/knights-lab/SHOGUN/issues/18#issuecomment-621059959, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5NOBSNIDIP6F3NXRJ45MTRO7PVDANCNFSM4E3SZVZQ .

GabeAl commented 4 years ago

[edit] D'oh, I found the test files in the very first post! I'm assuming you're using the same ones. I don't have a Mac, but maybe I can spin up a VM to test this.

What's the memory on the machine you're running it on?

Thanks a bunch, Gabe

On Thu, Apr 30, 2020 at 12:46 PM Gabe A. gabextreme@gmail.com wrote:

The source likely doesn't contain syntax errors, it just requires the Intel compiler and architecture-specific optimization flags because of the assembly instructions included.

It is highly, highly recommended to grab the prebuilt binary for BURST from the Releases section of the repo.

Thanks, Gabe

On Wed, Apr 29, 2020 at 4:22 AM Árpád Goretity  notifications@github.com wrote:

I ran into a similar issue. I wasn't able to get SHOGUN working with burst, since the latest official release of burst, v0.99.8, didn't even compile on my Linux machine (the source release contains syntax errors!).

So I installed bowtie2 and I ran SHOGUN with --aligner bowtie2. It kept crunching for about 18 minutes (htop was showing that the bowtie2 process was running), then I got the KeyError: 'summary' exception from Python. I don't know if bowtie2 segfaulted though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/knights-lab/SHOGUN/issues/18#issuecomment-621059959, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5NOBSNIDIP6F3NXRJ45MTRO7PVDANCNFSM4E3SZVZQ .

GabeAl commented 4 years ago

Also, what were the commands run to produce the database itself? Databases aren't compatible across major BURST releases.

GabeAl commented 4 years ago

What's the difference, again, between burst12 and burst15? Does the database need to be reindexed for one vs the other?

Yes. DB15 and DB12 have fundamentally different database structures. Also, major releases of BURST (lettered are minor, numbered are major) also may have incompatibilities. I think this should be detected if an older database or a database made with a different DB version of BURST is used. I believe later versions of burst (i.e. newer than the 0.97 series) will do this detection automatically, but perhaps Shogun should implement this check in the wrapper first, or warn if pointing to a DB it knows it shipped with an earlier version.

DB12 is for low-RAM alignment. It is slower, and primarily intended for amplicons. Burst15 is for higher-RAM alignment and intended for shotgun. This is vaguely similar to the difference between bowtie2-align-s and bowtie2-align-l, which are also non-interchangeable, but the python wrapper "bowtie2" sorts out which should be called with which.

H2CO3 commented 4 years ago

@GabeAl Hey, no, thank you for getting back to this!

Just to bring this in context, I'm familiar with building C code from source. It's not an unsupported assembly extension: the syntax error in particular I noticed was a missing closing curly brace here. After I added the closing curly on the next line, the compiler went ahead and complained about a type error here which is an assignment of a QPod to a value of type QPod *; judging from the surrounding code, it's probably a missing dereference. Then there is the redeclaration of numBins, RefCache and StCache here. I could imagine that the latter one is something the Intel compiler accepts. After I removed those, the code compiled just fine using -march=native with GCC 7.1. (I must admit, it might not do what it is doing under the Intel cc, though.)

I have since tried SHOGUN with the Linux binary downloadable from the same release (which advertises itself as burst15), with no success, unfortunately. Based on what several others suggested above, it might very well be that I simply don't have enough RAM; I'll be able to check this possibility soon, once I have access to a beefier machine. I have 8 GB in my Linux box, which seems to be close but no cigar.

The databases I didn't build myself, I simply downloaded the pre-built ones as suggested by the very last paragraph of this part of the README.

Cheers, Árpád

GabeAl commented 4 years ago

Thanks H2CO3!

Oh I see -- the current source indeed looks like it's for a WIP version and updates stopped after that. Later versions (completing the WIP, going into the 0.99.8 series, etc) must have never gotten pushed. I will push my local copy up.

Done. Let me know.

Cheerio, Gabe

H2CO3 commented 4 years ago

Awesome, thanks for that!

knights-lab / SHOGUN

SHOGUN silently produces empty output alignment when BURST segfaults #18