sile / patricia_tree

A memory-efficient patricia tree implementation written in Rust
MIT License
111 stars 17 forks source link

is it possible to make keys unicode safe? #14

Closed seppo0010 closed 1 year ago

seppo0010 commented 2 years ago

Hey there. Thanks for this library. I'm trying to create a JSON from the tree but it fails with unicode characters as they might get splitted on the tree. It would be nice to have unicode support and not breaking down multibytes chars. Thoughts?

sile commented 2 years ago

Hi, this is an interesting question. Since the memory layout of patricia_tree assumes the key type of a node to be u8 to save memory footprint, it's difficult to directly support Unicode characters as the keys. However, it might be possible by restricting new sibling nodes (tree branches) to be inserted to points where the UTF-8 character boundary is preserved (we need to adjust the return value of the skip_common_prefix method here I think). I'd like to consider this in more detail when I have time (maybe this or next weekend).

seppo0010 commented 2 years ago

Sounds good. I was actually thinking that instead of Vec<u8> the type of the keys could be Vec<K> being K Clone + PartialEq + PartialCmp, and maybe something else. That way it could be another numeric type, a char, or anything else. I tried to do it myself but I couldn't.

sile commented 2 years ago

[FYI] I started to implement this feature in #15 (this PR is still very in the early stage though)

sile commented 1 year ago

Resolved by #15