pytries / marisa-trie

Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.
https://marisa-trie.readthedocs.io/en/latest/
MIT License
1.03k stars 91 forks source link

User should be able to add strings to a Trie after instantiation #38

Closed jfinkels closed 7 years ago

jfinkels commented 7 years ago

As a user, I want to create an instance of Trie and add words to it one-at-a-time, so that I can use a Trie in a streaming environment (in which strings arrive on-the-fly). For example,

>>> from marise import Trie
>>> trie = Trie()
>>> trie.add(u'key1')
>>> trie.add(u'key12')
>>> u'key1' in trie
True
>>> u'key12' in trie
True
>>> u'key2' in trie
False

This also makes Trie behave more like a set of strings.

kmike commented 7 years ago

AFAIK this is not possible, it is a trade-off which enables high compression ratios - marisa-trie must know all keys in advance.

Trie constructor accepts an iterable (or generator) of values; by passing a generator you're reducing RAM required for temporary Python objects, it seems that's the best we can do.

jfinkels commented 7 years ago

Okay, thanks for the information. Feel free to close this issue, since it seems out of scope for this project.

On Tue, Jun 13, 2017 at 1:16 PM Mikhail Korobov notifications@github.com wrote:

AFAIK this is not possible, it is a trade-off which enables high compression ratios - Trie must know all keys in advance.

Trie constructor accepts an iterable (or generator) of values; by passing a generator you're reducing RAM required for temporary Python objects, it seems that's the best we can do.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pytries/marisa-trie/issues/38#issuecomment-308186424, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHbmzPJnDDBS8XbJNhjQuatt79KRY76ks5sDsPVgaJpZM4N4wgK .