marbl / meryl

A genomic k-mer counter (and sequence utility) with nice features.
119 stars 14 forks source link

Segfault with k >= 38 #7

Closed mrvollger closed 5 years ago

mrvollger commented 5 years ago

Hello,

I am unable to run meryl with kmer sizes greater than or equal to 38. Whenever I do I get an error that looks like:

meryl: utility/bits.C:711: uint32 stuffedBits::setBinary(uint32, uint64): Assertion `width < 65' failed.
meryl: utility/bits.C:711: uint32 stuffedBits::setBinary(uint32, uint64): Assertion `width < 65' failed.
meryl: utility/bits.C:711: uint32 stuffedBits::setBinary(uint32, uint64): Assertion `width < 65' failed.
meryl: utility/bits.C:711: uint32 stuffedBits::setBinary(uint32, uint64): Assertion `width < 65' failed.
....
Failed with 'Segmentation fault'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()
(null)::0 in (null)()
(null)::0 in (null)()
Segmentation fault (core dumped)

Below I have included how I install meryl along with my example commands and fasta input.

Any help would be greatly appreciated!

Thanks! Mitchell

Install script:

rm -rf meryl/
module load gcc/8.1.0
git clone https://github.com/marbl/meryl.git
cd meryl/src
make -j 24
cd ../../

My test fasta sequence:

>1
AAAAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGGGGGGGGGG

My test run with k =37.

meryl/Linux-amd64/bin/meryl count k=37 threads=32 output test test.fasta
Enabling 32 threads.

Counting 127 (estimated) canonical 37-mers from 1 input file:
    sequence-file: test.fasta

SIMPLE MODE
-----------

  Disabled for mers larger than 20.

COMPLEX MODE
------------

prefix     # of   struct   kmers/    segs/      min     data    total
  bits   prefix   memory   prefix   prefix   memory   memory   memory
------  -------  -------  -------  -------  -------  -------  -------
     1     2  P   240  B    64  M     1  S   128 kB   128 kB   128 kB
     2     4  P   480  B    32  M     1  S   256 kB   256 kB   256 kB
     3     8  P   960  B    16  M     1  S   512 kB   512 kB   512 kB
     4    16  P  1920  B     8  M     1  S  1024 kB  1024 kB  1025 kB
     5    32  P  3840  B     4  M     1  S  2048 kB  2048 kB  2051 kB
     6    64  P  7680  B     2  M     1  S  4096 kB  4096 kB  4103 kB
     7   128  P    15 kB     1  M     1  S  8192 kB  8192 kB  8207 kB
     8   256  P    30 kB     1  M     1  S    16 MB    16 MB    16 MB
     9   512  P    60 kB     1  M     1  S    32 MB    32 MB    32 MB
    10  1024  P   120 kB     1  M     1  S    64 MB    64 MB    64 MB  Best Value!
    11  2048  P   240 kB     1  M     1  S   128 MB   128 MB   128 MB
    12  4096  P   480 kB     1  M     1  S   256 MB   256 MB   256 MB
    13  8192  P   960 kB     1  M     1  S   512 MB   512 MB   512 MB
    14    16 kP  1920 kB     1  M     1  S  1024 MB  1024 MB  1025 MB
    15    32 kP  3840 kB     1  M     1  S  2048 MB  2048 MB  2051 MB

FINAL CONFIGURATION
-------------------

Configured complex mode for 0.063 GB memory per batch, and up to 1 batch.

kmerCountFileWriter()-- Creating 'test' for 37-mers, with prefixSize 10 suffixSize 64 numFiles 64
Loading kmers from 'test.fasta' into buckets.
Used 0.277 GB out of 2015.055 GB to store           87 kmers.

Writing results to 'test', using 32 threads.
finishIteration()--

Finished counting.
Bye.

My test run with k =38.

meryl/Linux-amd64/bin/meryl count k=38 threads=32 output test test.fasta
Enabling 32 threads.

Counting 127 (estimated) canonical 38-mers from 1 input file:
    sequence-file: test.fasta

SIMPLE MODE
-----------

  Disabled for mers larger than 20.

COMPLEX MODE
------------

prefix     # of   struct   kmers/    segs/      min     data    total
  bits   prefix   memory   prefix   prefix   memory   memory   memory
------  -------  -------  -------  -------  -------  -------  -------
     1     2  P   240  B    64  M     1  S   128 kB   128 kB   128 kB
     2     4  P   480  B    32  M     1  S   256 kB   256 kB   256 kB
     3     8  P   960  B    16  M     1  S   512 kB   512 kB   512 kB
     4    16  P  1920  B     8  M     1  S  1024 kB  1024 kB  1025 kB
     5    32  P  3840  B     4  M     1  S  2048 kB  2048 kB  2051 kB
     6    64  P  7680  B     2  M     1  S  4096 kB  4096 kB  4103 kB
     7   128  P    15 kB     1  M     1  S  8192 kB  8192 kB  8207 kB
     8   256  P    30 kB     1  M     1  S    16 MB    16 MB    16 MB
     9   512  P    60 kB     1  M     1  S    32 MB    32 MB    32 MB
    10  1024  P   120 kB     1  M     1  S    64 MB    64 MB    64 MB  Best Value!
    11  2048  P   240 kB     1  M     1  S   128 MB   128 MB   128 MB
    12  4096  P   480 kB     1  M     1  S   256 MB   256 MB   256 MB
    13  8192  P   960 kB     1  M     1  S   512 MB   512 MB   512 MB
    14    16 kP  1920 kB     1  M     1  S  1024 MB  1024 MB  1025 MB
    15    32 kP  3840 kB     1  M     1  S  2048 MB  2048 MB  2051 MB

FINAL CONFIGURATION
-------------------

Configured complex mode for 0.063 GB memory per batch, and up to 1 batch.

kmerCountFileWriter()-- Creating 'test' for 38-mers, with prefixSize 10 suffixSize 66 numFiles 64
Loading kmers from 'test.fasta' into buckets.
Used 0.277 GB out of 2015.055 GB to store           86 kmers.

Writing results to 'test', using 32 threads.
meryl: utility/bits.C:711: uint32 stuffedBits::setBinary(uint32, uint64): Assertion `width < 65' failed.
meryl: utility/bits.C:711: uint32 stuffedBits::setBinary(uint32, uint64): Assertion `width < 65' failed.
meryl: utility/bits.C:711: uint32 stuffedBits::setBinary(uint32, uint64): Assertion `width < 65' failed.
meryl: utility/bits.C:711: uint32 stuffedBits::setBinary(uint32, uint64): Assertion `width < 65' failed.

Failed with '
Failed with '
Failed with 'Aborted'; backtrace (libbacktrace):

Failed with 'AbortedAborted'; backtrace (libbacktrace):
'; backtrace (libbacktrace):
Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()

Failed with 'Segmentation fault'; backtrace (libbacktrace):
meryl: utility/bits.C:711: uint32 stuffedBits::setBinary(uint32, uint64): Assertion `width < 65' failed.
meryl: utility/kmers.H:526: void kmerCountFileIndex::set(uint64, FILE*, uint64): Assertion `_blockPosition <= AS_UTL_ftell(F)' failed.

Failed with 'Aborted'; backtrace (libbacktrace):

Failed with 'Aborted'; backtrace (libbacktrace):
meryl: utility/bits.C:711: uint32 stuffedBits::setBinary(uint32, uint64): Assertion `width < 65' failed.

Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()

Failed with 'Segmentation fault'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()

Failed with 'Segmentation fault'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()
(null)::0 in (null)()
(null)::0 in (null)()
Segmentation fault (core dumped)
arangrhie commented 5 years ago

Hi Mitchell, unfortunately, meryl does not support kmers with k>31 at the moment.

mrvollger commented 5 years ago

Hi Arang,

Thanks for the fast response!

I was confused because Wenger et al. (https://www.nature.com/articles/s41587-019-0217-9) were able to do trio binning with k sizes of 51 and 91. But I now see Sergey has some custom scripts for that here: https://github.com/skoren/triobinningScripts

Thanks, Mitchell

brianwalenz commented 5 years ago

The older version of meryl in that repo supports (arbitrarily) large K (via a compile time option).

Meryl was reimplemented last summer, but it lost support for k>31 (32 might work though). We want to add that support back, but it's not a high priority right now.

mrvollger commented 5 years ago

Hi Brian,

Good suggestion, I will pull an older version of the repo.

Thanks, Mitchell