marbl / meryl

A genomic k-mer counter (and sequence utility) with nice features.
113 stars 13 forks source link

segmentation fault for k<6 #26

Closed JonEilers closed 2 years ago

JonEilers commented 2 years ago

Hi, am trying to create a meryl database with a kmer value of 4 or 5, however I keep getting the below error. It works for k greater than or equal to 6 though.

meryl count k=5 output s_chlorontus_5mer /home/jon/Working_Files/sea_cuke_species_data/stichopus_chloronotus/SRR8499559_1.fastq

Found 1 command tree.

Counting 38 (estimated) billion canonical 5-mers from 1 input file:
    sequence-file: /home/jon/Working_Files/sea_cuke_species_data/stichopus_chloronotus/SRR8499559_1.fastq

SIMPLE MODE
-----------

  5-mers
    -> 1024 entries for counts up to 65535.
    -> 16 kbits memory used

  41845276868 input bases
    -> expected max count of 167381107, needing 13 extra bits.
    -> 13 kbits memory used

  3712  B memory needed

Failed with 'Floating point exception'; backtrace (libbacktrace):

Failed with 'Segmentation fault'; backtrace (libbacktrace):
Segmentation fault (core dumped)
brianwalenz commented 2 years ago

The last time I looked at this, it was a flaw in the way I process kmers in memory and/or store them on disk. Each kmer is split into three pieces and at small kmer sizes one of the pieces is empty. It didn't look like an easy fix back then -- this was a few years ago -- but I'll give it another try.

For what it's worth (not much) the latest unreleased version of meryl doesn't crash. It gets stuck in an infinite loop instead.

brianwalenz commented 2 years ago

Fixed! Use the unreleased 'tip' version:

% git clone https://github.com/marbl/meryl.git
% cd meryl/src
% make -j 4
% ../build/bin/meryl --version
JonEilers commented 2 years ago

Works, Thanks!