michaelsproul / rust_radix_trie

Fast generic radix trie implemented in Rust
https://docs.rs/radix_trie/
MIT License
184 stars 32 forks source link

get_raw_ancestor #59

Open leshow opened 4 years ago

leshow commented 4 years ago

the example code has this:

    // This is a bit of a hack that relies on knowing the binary representation of
    // strings... "abd" works, but "abz" doesn't...
    let ab_sum = t.get_raw_ancestor(&"abd").children().fold(0, |acc, c| {
        println!("Iterating over child with value: {:?}", c.value());
        acc + *c.value().unwrap_or(&0)
    });

I tried it, indeed, "abz" doesn't work. Why? What can be done to fix this?

michaelsproul commented 4 years ago

In the example the trie contains the keys

z
aba
abb
abc

which represented as bytes (the only thing the trie cares about) is:

z = [0x7a]
aba = [0x61, 0x62, 0x61]
abb = [0x61, 0x62, 0x62]
abc = [0x61, 0x62, 0x63]

The three ab* keys will be stored under a common prefix, and because the trie uses 16 nodes per level (i.e. half a byte worth of key -- a nibble), it can split that prefix on a half-byte boundary. So there'll be an intermediate node in the trie at key [0x61, 0x62, 0x6_], with three children for keys 0x_1, 0x_2, 0x_3. You can fetch this intermediate node using get_raw_ancestor and any key that starts with [0x61, 0x62, 0x6_] (like "abd") but not with a key like "abz" = [0x61, 0x62, 0x7a].

If you had the same data and wanted to iterate over all the ab* values, you could use get_raw_descendant("ab") to fetch the common prefix node. Or if you're happy to modify the data slightly, you could insert a dummy value for the key "ab" and use subtrie("ab") (that seems cleaner in a lot of ways)

Admittedly, the nibble-oriented approach can be counter-intuitive and I'm not certain that this is the best possible API, but it is what it is. Hope that helps

leshow commented 4 years ago

Thanks for commenting, the thing that I'm trying to do is this:

    t.insert("z", vec!["z"]);
    t.insert("a", vec!["a"]);
    t.insert("ab", vec!["b"]);
    t.insert("abc", vec!["c"]);
    t.insert("abcd", vec!["c"]);

    assert!(t.get_all_on_path(&"abce").eq(vec!["a", "b", "c"].iter()));

is something like this possible? If not currently, would you be open to accepting a PR to add it? Any suggestions for implementation?