Closed liori closed 8 years ago
For what it's worth, this would be useful to me as well. Any updates on whether this might be merged?
I'm okay with this change. Will review the code later this week.
@kmike what do you think?
Yeah, I think that's a nice feature. It'd be good to check that there are no serious performance regressions (a few % is fine).
Hi, I've changed NoPrefixGiven
into None
, added the necessary .cpp files and added myself to the authors file. I hope I haven't broken anything in the process.
Everything looks good. Thank you very much, @liori!
@kmike I'll do the benchmarks after the merge.
Thanks @liori and @superbobry!
One more thing missing—documentation update. I'll try to prepare something over the weekend.
I've wanted to port the current docs to Sphinx for quite a while, so if you have the time, don't hesitate :)
BinaryTrie
: aTrie
that does not attempt to encode its keys as UTF8.I am storing lots of SHA hashes in many large Tries (hundreds of megabytes each, potentially hundreds of such tries at any time). I used to store them as a hex digest, but then noticed that my Tries are often significantly smaller when I use a binary encoding. Savings of tens of gigabytes in my use-case. Except…
marisa_trie
forces all keys to be unicode, so it fails with'\0'.encode('utf8')
.This pull request is an attempt to implement a Trie which is binary-safe in keys.
BinaryTrie
basically a stripped-downTrie
class, where I removed every.encode('utf8')
call and fixed some small bugs related to handling\0
. Tests are just a slightly edited copy ofTrie
tests.I haven't tested the code under any python other than 2.7 and 3.4, as only these two are available on my system. I am also not sure yet if the implementation is correct. I will test it more; now I'm just requesting comments to the implementation.