pytries / marisa-trie

Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.
https://marisa-trie.readthedocs.io/en/latest/
MIT License
1.03k stars 91 forks source link

Starting an iterator in the middle of a values list in a label-ordered RecordTrie #30

Open gmossessian opened 8 years ago

gmossessian commented 8 years ago

Hi Mike,

Thanks for building this wrapper and providing the additional Bytes and RecordTrie classes, they're great and extremely useful and have been largely easy to build additional features into.

I would like to implement a RecordTrie feature as follows: if there are key-value pairs (u'a', (1, N_1)), (u'a', (2, N_2)), (u'a',(3, N_3)), ..., (u'a', (i, N_i)), and I know that I only need the values stored in N_p through N_q. If i is very large (say, a million), calling my_list = my_trie[u'a'] is prohibitively slow, and so I would like to start the loop at the pth value, that is, at b_prefix = <bytes>u'a'.encode('utf8') + self._b_value_separator + <bytes>bytes(struct(">I", p)).

Of course, if I set Agent key to this, the loop will not continue on to (u'a', (p+1, N_{p+1})), and I cannot figure out how to "trick" the predictive search into continuing to loop past that specific prefix and on to anything with prefix simply <bytes>u'a'.encode('utf8') + self._b_value_separator without resetting the entire loop from the top.

Do you have any idea of how this might be accomplished, or if it is a limitation of the marisa-trie base library?

Thanks, -George