pytries / marisa-trie

Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.
https://marisa-trie.readthedocs.io/en/latest/
MIT License
1.03k stars 91 forks source link

MARISA_SIZE_ERROR: buf_.size() > MARISA_UINT32_MAX #26

Closed caseybrown89 closed 8 years ago

caseybrown89 commented 8 years ago

Hello,

I recently inherited some code from a developer who had departed. It is safe to say that the amount of data flowing into the trie has increased over time. This bug looks like an overflow.

Stack trace:

File "marisa_trie.pyx", line 422, in marisa_trie.BytesTrie.init (src/marisa_trie.cpp:7670) File "marisa_trie.pyx", line 127, in marisa_trie._Trie._build (src/marisa_trie.cpp:2768) RuntimeError: lib/marisa/grimoire/trie/tail.cc:192: MARISA_SIZEERROR: buf.size() > MARISA_UINT32_MAX

superbobry commented 8 years ago

Hi,

This does indeed look like an overflow. I think you should try reporting the issue to the original author. Here we're mostly focused on the Python wrapper part.

Or you can do a quick and dirty hack and just build multiple tries instead of single one :)

caseybrown89 commented 8 years ago

Ah sorry! I should have looked more closely at the library before posting. I'll follow up with the original author, thanks!

superbobry commented 8 years ago

No worries. Please do post the update here once you solve this to let other users know what can be done.