uni-halle / gerbil

A fast and memory-efficient k-mer counter with GPU-support
MIT License
34 stars 14 forks source link

Kmercount equal to 0 #16

Closed yvanlucas closed 4 years ago

yvanlucas commented 4 years ago

Dear developers,

First I wanted to thank you for the software, it works perfectly fine and is very fast.

I have a question regarding the output of gerbil kmer count: For some kmers in some sequences of bacteria genome I get a kmercount value of 0.0 whereas for some other sequences this typical kmer doesn't appear. (I also tried to use the key "-d" to disable normalization but it didn't stop this behaviour)

My question is: What explain this behaviour? How is it possible that some kmer are counted like non existant for some sequence (not appearing in the output file) and for other sequence where they appear to also be non existant, they appear in the output file but as 0. What is the difference between these two types of "non-existance" marker (not appearing vs appearing with kmer count equal 0)?

Best regards,

Yvan

merbert commented 4 years ago

Well, that's strange. Can you give me the version and the exact call of the program? A test case would also be helpful.

yvanlucas commented 4 years ago

The version of Gerbil I am using is the 1.11 according to the command gerbil --version

The call of the program is "gerbil -k 7 -l 1 -d pathtofasta pathtotemp pathtooutputbinary" and afterwards I transform it using the tofasta command: "toFasta pathtooutputbinary 11 paththoutputfasta".

I checked manually in my sequence for some kmers that were misbehaving and found out that they were actually present, sometimes in high quantity.

I also tried to increase the value of -k and it doesn't run for k=8 but works perfectly fine (I mean the results are sound and checkable on the fasta file) for 9 10 and 11.

It seems to be an issue with the kmer size being too low. However I don't know why it is too low for gerbil and also why a kmer size of 8 doesn't work at all (no output file produced).

Thanks for the answer anyway, checking it more in depths solved the issue raised, letting only questions about the parameters range =).

Regards,

Yvan

lalalagartija commented 2 years ago

Hi, I have the same issue but with kmers that are actually absent. Their reverse complement was absent as well.