rikuu / Gap2Seq

Gap2Seq is a gap filling and insertion genotyping tool.
GNU Affero General Public License v3.0
17 stars 6 forks source link

k-mer counts were clipped to 255 #4

Open a-kroh opened 5 years ago

a-kroh commented 5 years ago

Hi,

Gap2Seq looks like a great tool and mostly performed well when I tested it (closing most smaller [<1000 bp] gaps in my test dataset). However, I always get an error message while the program runs and I am worrying that this might affect the ability of the program to close larger gaps. Here is the error message (plus adjacent lines from the log):

2018-09-07 17:41:37: Round 62, 0% nodes remaining
2018-09-07 17:41:37: Assigning values
2018-09-07 17:41:39: Setting abundances of 42410107 kmers.
2018-09-07 17:41:52: WARNING: 100379 k-mer counts were clipped to 255
2018-09-07 17:42:11: Saving mphf to disk

I assume it means that the program was not able to correctly store kmer counts due to some memory limitation. Increasing the memory available to the program (to 200 GB) does not seem to make a difference though.

Any ideas? All the best Andreas

rikuu commented 5 years ago

The maximum memory parameter only affects the structures in Gap2Seq while leaving the graph construction from GATB unaffected. I would assume there is some way to control the maximum k-mer abundance from the code.

However, I don't think the abundances matter in the context of Gap2Seq. The only place the abundances are used is when infrequent k-mers are filtered out by GATB-core.

That said, you can probably get rid of the warning by increasing the value of k.