sdleffler / qp-trie-rs

An idiomatic and fast QP-trie implementation in pure Rust.
Mozilla Public License 2.0
94 stars 24 forks source link

Types design #17

Open CthulhuDen opened 5 years ago

CthulhuDen commented 5 years ago

Why should the Trie itself be parameterized over key type? Currently both having K as type parameter and needing it passed as owned to insert and (especially) to entry pretty much forces us to make copy of key for each invocation. I think we only need key when first inserting into the trie, which could be better achieved with ToOwned (btw this comment makes sense for me: https://github.com/sdleffler/qp-trie-rs/blob/91507d5cfee53bafdeeb69deb36140f496a3bbaa/src/node.rs#L245). This way, insert and entry would require Borrow + ToOwned and would clone only when really necessary.

Then if we stored [u8] in some form in node and parameterize functions like insert and entry themselves which would allow passing them &'a [u8] with different 'a which is currently impossible.

sdleffler commented 5 years ago

Storing [u8] in a node can only be done by some sort of heap indirection (minus small_vec and similar optimizations.) Using ToOwned/Borrow escapes the need for any such thing when using small or fixed-size keys, for example small byte arrays ([u8; 4].)

sdleffler commented 5 years ago

In addition, Trie is (despite having a byte slice interface) effectively a key-value map. I don't like the idea of allowing users to insert keys of multiple different types into the same Trie; that seems like a good recipe for a type error, and something that should be solved by making an enum of different key types. Do you have a use case for this lack of parametricity over K?

CthulhuDen commented 5 years ago

But isn't it the case that the keys are usually non-fixed-length?

My use case involves storing (small number of) different strings and finding/inserting (if new) them frequently (like 1 mil rps). I really don't want to clone key on every request, and I cannot pass string ownership to entry() either. The fact that Trie (persistent) is parameterized over K and lifetime of key must be at least that of Trie I cannot specify K to reference type.

CthulhuDen commented 5 years ago

Now imagine if Trie wasn't parameterized (I'm new to rust so not sure on exact signature)

struct Trie(Vec<u8>);

impl Trie {
    fn insert<K: ?Sized>(&mut self, key: &K)
    where
        Vec<u8>: Borrow<K>,
        K: ToOwned<Owned = Vec<u8>>,
    {
        self.0 = key.to_owned();
    }
}

This would allow me pass &[u8] as key and it would only be copied when necessary.

sdleffler commented 5 years ago

It does not ring true for me that cloning the string is necessary. It's been a while since I looked at this code but Borrow<K> should allow you to pass in a reference instead and have it not be cloned unless it is not in the trie. Have you looked through the source? I'll need to dig through to check this.

CthulhuDen commented 5 years ago

I could pass reference (instantiate K to reference type), but I would have to fix lifetime of references which I can pass to insert/entry for each Trie instance, which is exactly why I have a problem with parameterizing the Trie data type.

CthulhuDen commented 5 years ago

Ok so maybe example:

let mut trie: Trie<&[u8], u16> = Trie::new();
{
    let keyFromSomewhere = [a, b, c];
    trie.entry(&keyFromSomewhere[..]);
}
{
    let keyFromSomewhereElse = [d, e, f];
    trie.entry(&keyFromSomewhereElse[..]);
}

This currently won't work because lifetimes of two references will not match. It would not be a problem if the function entry was parameterized instead of whole Trie.

sdleffler commented 5 years ago

Mm, I see your problem.

This does not need to be solved by removing the K parameter (and I am immediately against that solution, I think it destroys part of the type safety which is preserved by its presence.) It should be possible to add a new type variable in insert, call L: ToOwned<Owned = K>, for which you can pass in either K or &K (or other types implementing ToOwned<Owned = K>. Calling to_owned on K would be a noop and simply pass the value through, while calling to_owned on &K would end in a clone.

I don't remember if there's a reason I didn't implement it this way. It may have been an oversight, or caused other issues with types. Will try to take a look tonight and see if I can figure it out.

sdleffler commented 5 years ago

If these keys will never be removed, and it is acceptable to keep their memory for the lifetime of your binary, I believe there are libraries which can be used to intentionally leak memory, getting you an &'static [u8] which will not have the lifetime issue. Could be a usable temporary solution.

CthulhuDen commented 5 years ago

Thanks for having a look at the issue. However considering I want to avoid copying the keys because of high intensity of requests, I dont think leaking the keys memory forever is a solution for me =)