Keyset causes bottleneck for large data sets?

Hi, I am currently trying to insert every consecutive k-mer (subsequence of length k) of the human chromosome 21 into your trie structure. Since DNA has only a character alphabet of {A, C, G, T} I thought that might be a good use case for tries. With k=32 we are talking about ~55 million "words" to be inserted.

It seems that the bottleneck here is the marisa::Keyset, since this data structure obviously gets way too big for the RAM. I never even make it to the building step of marisa::trie. Is there a way to avoid building the full keyset first? Any chance to store data of that size with this library? I use pretty much the code from your README in section "Library->How to use".

s-yata / marisa-trie

Keyset causes bottleneck for large data sets? #18