salarshad / marisa-trie

Automatically exported from code.google.com/p/marisa-trie
Other
0 stars 1 forks source link

Stepwise walking #9

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I'm storing values in a trie with this encoding scheme:

<utf8-encoded unicode key> + chr(255) + <binary_value>

This works perfectly, but common prefix search is suboptimal with the current 
marisa-trie API.
Ideally, this should be implemented like this:

* walk to a char in a key;
* if char is not walkable, exit loop;
* test if data separator (chr(255)) is walkable;
* if it is walkable, add the current key to a list of prefixes.

Instead of this, it can be implemented like this now (pseudocode):

        while ind <= key_len:
            prefix = key[:ind]
            ag.set_query(prefix + _VALUE_SEPARATOR)
            if trie.predictive_search(ag):
                result.append(prefix)
            ind += 1

        return res

This is suboptimal because:

* there is no fail-fast if the current prefix is not in a trie;
* trie is walked from the root for each prefix.

Stepwise API would be great for making this more efficient.

Original issue reported on code.google.com by kmik...@gmail.com on 23 Aug 2012 at 1:36

GoogleCodeExporter commented 9 years ago
I've understood the problem and now I also think stepwise API should be 
provided.

However, it is not an interface problem.
marisa-trie may require a major update to provide efficient stepwise API.
So, it is difficult to solve this in a short period.

Original comment by susumu.y...@gmail.com on 30 Aug 2012 at 1:02

GoogleCodeExporter commented 9 years ago

Original comment by susumu.y...@gmail.com on 15 Sep 2012 at 4:22