Open dragoon opened 10 years ago
Have you tried using the RecordTrie instead? (same module)
I don't really understand this structure, it has some keys and values, while I have only values.
Ah, I see what you mean now, disregard my earlier comment. Yeah, as far as I'm aware it only accepts unicode strings.
@dragoon I'm not sure adding support for having any object as a key is a good idea - because I don't know how to implement it efficiently.
We can't store just an id of object (it defeats the purpose of marisa-trie), so we should somehow serialize the key to bytes to use it as a key. For strings the wrapper encodes unicode input to utf8.
In order to support arbitrary objects we may use pickle, but I'm not sure how compressable is the result, and better task-specific serialization methods usually exists. For example, in your case (a tuple with 2 strings) it makes sense to join the strings using some separator before adding to the trie and split by this separator when retreiving. You don't need marisa-trie support to do this.
But that's true that there are some edge cases (separator inside the tuple element?), splitting/joining tuples could be more efficient if implemented in Cython, and storing tuples of strings is quite common. So I think adding a trie subclass that allows tuples of strings as keys is a good idea - ngram storage is a common use case. Pull requests are welcome :)
I tried to construct the following trie:
Which gave me
AttributeError: 'tuple' object has no attribute 'encode'
. So I suppose the library accepts only strings, but sometimes you want other structures.