michaelsproul / rust_radix_trie

Fast generic radix trie implemented in Rust
https://docs.rs/radix_trie/
MIT License
188 stars 33 forks source link

Method of visiting all children of a node? #16

Closed andrewcsmith closed 8 years ago

andrewcsmith commented 8 years ago

I'm looking at your traversal traits, and attempting to write a struct that visits all the immediate children of a node and adds their values. However, given that the children field is private, I can't figure out how to make my struct traverse the child nodes. Is there something in the current interface that allows this, or would I have to change the actual crate code?

My RefTraversal code right now (Trie essentially stored as <&str, u32>):

struct GetChildren;
impl<'a, K: Debug + TrieKey + 'a> RefTraversal<'a, K, u32> for GetChildren {
    type Input = &'a Trie<K, u32>;
    type Output = u32;

    fn default_result() -> Self::Output { 0u32 }

    fn child_match_fn(trie: &'a Trie<K, u32>, input: Self::Input, nv: NibbleVec) -> Self::Output {
        // Children code goes here
        Self::run(&input, &input, nv);
        trie.value().unwrap_or(&0u32).clone()
    }

    fn action_fn(trie: &'a Trie<K, u32>, intermediate: Self::Output, bucket: usize) -> Self::Output {
        intermediate + trie.value().unwrap_or(&0u32)
    }
}
michaelsproul commented 8 years ago

Hi Andrew!

I don't think using a traversal trait is right for your use case. Traversals are a really just a low-level primitive I created to avoid code duplication between get/insert/remove. I think you really want something more like this?

trie.get_node(&key).unwrap()
    .child_iter()
    .fold(0, |acc, child| acc + child.value().unwrap_or(0))

The caveat here is that the child_iter method is not currently public. I'm concerned over the unsoundness of returning subtries with half their key missing (see #7), so the robust solution as I see it is:

  1. Implement the subtrie stuff from #7.
  2. Make a public child_iter method that returns an iterator over the child subtries.

Now, depending on how adventurous crazy you feel, you could have a go at implementing this yourself, or you could use the hacky-child-iter branch, where I just made the child_iter method public.

ONE further insane thing is that due to the use of 4-bit branching at each level, the definition of "child" in this radix trie may not be what you're after. If you take a look at the child_iter.rs example, the computed result is oddly 0, because a bit prefix of 96 (or 0x60) is given its own node above "a" (0x61), "b" (0x62) and "c" (0x63). I am unsure how to "fix" this, given that the data structure is inherently binary-focussed (for better and worse).

One thing we could easily do is define a fold function over the entire trie, but this can already be accomplished using .values().fold(), and I'm guessing isn't what you want.

In a trie with keys "a", "b", "c", the first child is:

Trie {
    key: NibbleVec [6],
    key_value: None,
    length: 3,
    child_count: 3,
    children: [
        None,
        Some(
            Trie {
                key: NibbleVec [1],
                key_value: Some(
                    KeyValue {
                        key: "a",
                        value: 5
                    }
                ),
                length: 1,
                child_count: 0,
                children: [
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None
                ]
            }
        ),
        Some(
            Trie {
                key: NibbleVec [2],
                key_value: Some(
                    KeyValue {
                        key: "b",
                        value: 6
                    }
                ),
                length: 1,
                child_count: 0,
                children: [
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None
                ]
            }
        ),
        Some(
            Trie {
                key: NibbleVec [3],
                key_value: Some(
                    KeyValue {
                        key: "c",
                        value: 50
                    }
                ),
                length: 1,
                child_count: 0,
                children: [
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None,
                    None
                ]
            }
        ),
        None,
        None,
        None,
        None,
        None,
        None,
        None,
        None,
        None,
        None,
        None,
        None
    ]
}
andrewcsmith commented 8 years ago

Ah, I see what issue #7 is about now. What about instead of calling it SubTrie we just call it TrieRoot, and therefore a "normal" Trie will just have a prefix of NibbleVec []?

Another question–in the 4-bit branching, will all the child nodes of a similar length (that is, all the length-3 children of "ab": "abc", "abd", "abe", etc.) be on the same hierarchical level? I need to non-recursively visit all nodes containing kv pairs, prefixed by some search string "ab". I think I've actually made some progress on the Traversal front. Check this out:

fn child_match_fn(trie: &'a Trie<K, u32>, input: Self::Input, nv: NibbleVec) -> Self::Output {
    println!("children {:?}, {:?}, {:?}", trie.key(), trie.value(), nv);
    // Children code goes here
    for key in trie.keys() {
        // println!("{:?}", key);
        match input.get_node(&key) {
            Some(k) => {
                Self::run(&input, &k, NibbleVec::from_byte_vec(key.encode()));
            }
            None => { }
        }
    }
    trie.value().unwrap_or(&0u32).clone()
}

The problem is that this visits some nodes twice, and I need to not recurse. But perhaps it's getting closer, without making anything extra public?

Greetings from Santa Cruz!

michaelsproul commented 8 years ago

Ah, I see what issue #7 is about now. What about instead of calling it SubTrie we just call it TrieRoot, and therefore a "normal" Trie will just have a prefix of NibbleVec [] ?

Good idea!

I think I've got something of a solution for the 4-bit branching problem. We just need a function (say get_raw_ancestor) that returns the closest ancestor of a given key, regardless of whether or not that ancestor node has a value. I've implemented this on a branch, but I had to change the traversal logic, and it broke most of the unit tests, so I'll have to have a closer look a bit later.

See: https://github.com/michaelsproul/rust_radix_trie/blob/traversal-hacks/src/lib.rs#L385

Don't worry too much about making extra things public - the library hasn't really been used by anyone extensively, so the API is far from complete, and some growing pains are to be expected.

It's nice to hear from someone in Santa Cruz! I really enjoyed living there on exchange during 2014/2015 :smile:

andrewcsmith commented 8 years ago

Perhaps I've just been staring at this too long—but I can't figure out where the recursion happens in the RefTraversal::run method. I only see one invocation of Self::run, but it seems that recursion is happening whenever the run method returns a None value. Where is that called?

michaelsproul commented 8 years ago

Yeah, it's not recursive in the sense that it traverses the whole tree, it just uses recursion to reach the spot where a single key should be located, if that makes sense...?

michaelsproul commented 8 years ago

Ok, I've updated the master branch with a working get_raw_ancestor function and a public child_iter iterator. Let me know if these functions allow you to do what you want to do. There's an example in examples/child_iter.rs that you might find useful.

It's by no means a perfect fix, so I won't do a release yet, but you can depend upon this version by adding a Git dependency to your Cargo manifest:

[dependencies]
radix_trie = { git = "https://github.com/michaelsproul/rust_radix_trie.git" }

(for more info on Cargo stuff like this: http://doc.crates.io/manifest.html#the-dependencies-section)

andrewcsmith commented 8 years ago

Excellent. I'll take a look in a sec. In the meantime, I just finished up a fix of my own that allows for returning a node with the root node intact. Definitely a WIP, but it required minimal changes to the existing code and didn't break any tests.

michaelsproul commented 8 years ago

Ooh! Looks great! I'll dig into it soon (I've got a lot of uni work atm).

michaelsproul commented 8 years ago

Done