pytries / marisa-trie

Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.
https://marisa-trie.readthedocs.io/en/latest/
MIT License
1.03k stars 91 forks source link

Needs find-longest key method #14

Open KCzar opened 9 years ago

KCzar commented 9 years ago

Or we could just call it a "longest" method - or "prefix" method (singular). Is there an efficient way to find the longest key that is a prefix of a string that I'm overlooking? It could be done with the current implementation by simply using prefixes, then finding the longest match, but there should be a much more efficient way possible by taking advantage of the Trie properties.

I can give it a go if you like - unless it's already implemented and I'm simply missing it.

stuaxo commented 8 years ago

Not sure if I want the same thing, but it sounds like it -

given

Trie([u'some.long.name', u'some.short.name']) I'd like a way to get back

[u'some.long', u'some.short'])

superbobry commented 8 years ago

I'm not sure if this is possible using the public API of marisa-trie the C++ library. Need to check and then maybe forward the issue upstream.

KCzar commented 8 years ago

When I originally posted that, I for some reason didn't realize 'prefixes' returned them in order (duh), so what I needed was basically already implemented satisfactorily.

What you stuaxo want is something else. I'm not completely sure what you want though...

stuaxo commented 8 years ago

In the end, I implemented this using a different library, + what I needed turned out to be slightly different - I needed something like this:

t = Trie([u'some.long.name', u'some.short.name'])
t.longest_unique_prefixes()
u['some.']
t = Trie([u'some.long.name', u'some.short.name', 'blah.1', 'blah.2'])
t.longest_unique_prefixes()
u['some.', 'blah.']
stuaxo commented 8 years ago

^ I guess what I wanted needed here is the 'roots' of the Trie.