muellan / metacache

memory efficient, fast & precise taxnomomic classification system for metagenomic read mapping
GNU General Public License v3.0
57 stars 12 forks source link

Segmentation fault #8

Closed alex-dem closed 4 years ago

alex-dem commented 4 years ago

Hello,

if I use the query option with a fastq file containing more than 15 Ts in a row I get a segmentation fault. If I replace the 16 Ts with 16 As (or more) there is no segmentation fault.

This fastq works properly: @QHOT2:10403:11886 CTGACAGCATGTTGACTTTTGCTTCCTTTGTCAAGCACAGAAAAACAGGAAGCACAAAATCATTTTTTTTTTTTTTT +

But if I add an extra T at the end, I get a segmentation fault. @QHOT2:10403:11886 CTGACAGCATGTTGACTTTTGCTTCCTTTGTCAAGCACAGAAAAACAGGAAGCACAAAATCATTTTTTTTTTTTTTTT +

thank you

muellan commented 4 years ago

We did a quick check and could not reproduce your problem, so it seems that I need more information about your setup:

alex-dem commented 4 years ago

I also run the included tests and everything works fine.

The same fault happens if I trim the read from the beginning:

@QHOT2:10403:11886 AAAAACAGGAAGCACAAAATCATTTTTTTTTTTTTTTT +

muellan commented 4 years ago

Could you give me the output of the commands:

alex-dem commented 4 years ago

./metacache info

MetaCache version 0.6.1 (20190925) database version 20190916

sequence type std::__cxx11::basic_string<char, std::char_traits, std::allocator > target id type unsigned short int 16 bits target limit 65535

window id type unsigned int 32 bits window limit 4294967295 window length 128 window stride 113

sketcher type mc::single_function_unique_min_hasher<unsigned int, mc::same_size_hash > feature type unsigned int 32 bits feature hash mc::same_size_hash kmer size 16 kmer limit 16 sketch size 16

bucket size type unsigned char 8 bits max. locations 254 location limit 254

hit classifier mc::best_distinct_matches_in_contiguous_window_ranges

./metacache info refseq.db Reading database from file 'refseq.db' ... done.

MetaCache version 0.6.1 (20190925) database version 20190916

sequence type std::__cxx11::basic_string<char, std::char_traits, std::allocator > target id type unsigned short int 16 bits target limit 65535

window id type unsigned int 32 bits window limit 4294967295 window length 128 window stride 113

sketcher type mc::single_function_unique_min_hasher<unsigned int, mc::same_size_hash > feature type unsigned int 32 bits feature hash mc::same_size_hash kmer size 16 kmer limit 16 sketch size 16

bucket size type unsigned char 8 bits max. locations 254 location limit 254

hit classifier mc::best_distinct_matches_in_contiguous_window_ranges

I run the command ./metacache query refseq.db dummy.fq and also, I tried the interactive mode: ./metacache query refseq.db

dummy.fq

both gave me segmentation fault.

muellan commented 4 years ago

Hm, these are all default values. Does the segfault also happen, if you use just the one sequence from above or only if this sequence is part of a larger fastq file?

Could it be, that the database building process didn't finish properly? You could try to checkout the latest version from the Github repo again. Compile it again and build the database again. I know this isn't very helpful, but I really don't have any ideas at the moment, since we are unable to reproduce your bug.

alex-dem commented 4 years ago

Ok, I'll recompile everything from scratch and rebuild the database. FYI, the segfault was happening on a larger file and by constant dividing the file in half, I got to one (of many ?) read that is causing a segfault.

Thanks for your time and I'll let you know how it went!

alex-dem commented 4 years ago

Everything works now as expected. I downloaded the source from git, compiled it and rebuilt the database and I don't get a segmentation fault any more. Thank you for your help!