pytries / marisa-trie

Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.
https://marisa-trie.readthedocs.io/en/latest/
MIT License
1.03k stars 91 forks source link

It's inconvenient that the IDs of marisa_trie.Trie are unpredictable and keys can't be retrieved in ID order #28

Closed ExplodingCabbage closed 8 years ago

ExplodingCabbage commented 8 years ago

See below:

>>> import marisa_trie
>>> trie = marisa_trie.Trie(['zeroth', 'first', 'second', 'third', 'fourth', 'fifth'])
>>> # IDs aren't ordered the same as original input list:
... trie.get('zeroth')
2
>>> # IDs aren't ordered the same as iteration order of the trie, either:
... for word, ID in trie.items():
...     print(word, ID)
... 
fifth 4
first 5
fourth 3
second 0
third 1
zeroth 2

This is inconvenient given that one possible use case, actually encouraged in the README, is to

use the returned ID to store a value in a separate data structure (e.g. in a python list

Ideally I'd like to be able to loop over my elements in ID order to construct such a list. I guess I can create a list of the right length and then assign into it, but couldn't this be made easier (either by assigning IDs according to the order the words were passed to Trie() in, or by having iteration over the trie iterate in ID order?

kmike commented 8 years ago

I'm not sure it is possible to do what you want. These ids are internal, they depend on internal graph representation and not stored anywhere. Think of them like hashes, which values happen to be between 0 and len(trie)-1. It may be impossible to assing these IDs explicitly without making tries use much more memory. Creating a list or an array of a required length and then filling it is not too bad :)