Open CthulhuDen opened 5 years ago
Storing [u8]
in a node can only be done by some sort of heap indirection (minus small_vec
and similar optimizations.) Using ToOwned
/Borrow
escapes the need for any such thing when using small or fixed-size keys, for example small byte arrays ([u8; 4]
.)
In addition, Trie
is (despite having a byte slice interface) effectively a key-value map. I don't like the idea of allowing users to insert keys of multiple different types into the same Trie
; that seems like a good recipe for a type error, and something that should be solved by making an enum of different key types. Do you have a use case for this lack of parametricity over K
?
But isn't it the case that the keys are usually non-fixed-length?
My use case involves storing (small number of) different strings and finding/inserting (if new) them frequently (like 1 mil rps). I really don't want to clone key on every request, and I cannot pass string ownership to entry()
either. The fact that Trie
(persistent) is parameterized over K
and lifetime of key must be at least that of Trie
I cannot specify K
to reference type.
Now imagine if Trie
wasn't parameterized (I'm new to rust so not sure on exact signature)
struct Trie(Vec<u8>);
impl Trie {
fn insert<K: ?Sized>(&mut self, key: &K)
where
Vec<u8>: Borrow<K>,
K: ToOwned<Owned = Vec<u8>>,
{
self.0 = key.to_owned();
}
}
This would allow me pass &[u8]
as key and it would only be copied when necessary.
It does not ring true for me that cloning the string is necessary. It's been a while since I looked at this code but Borrow<K>
should allow you to pass in a reference instead and have it not be cloned unless it is not in the trie. Have you looked through the source? I'll need to dig through to check this.
I could pass reference (instantiate K
to reference type), but I would have to fix lifetime of references which I can pass to insert
/entry
for each Trie
instance, which is exactly why I have a problem with parameterizing the Trie
data type.
Ok so maybe example:
let mut trie: Trie<&[u8], u16> = Trie::new();
{
let keyFromSomewhere = [a, b, c];
trie.entry(&keyFromSomewhere[..]);
}
{
let keyFromSomewhereElse = [d, e, f];
trie.entry(&keyFromSomewhereElse[..]);
}
This currently won't work because lifetimes of two references will not match. It would not be a problem if the function entry
was parameterized instead of whole Trie
.
Mm, I see your problem.
This does not need to be solved by removing the K
parameter (and I am immediately against that solution, I think it destroys part of the type safety which is preserved by its presence.) It should be possible to add a new type variable in insert
, call L: ToOwned<Owned = K>
, for which you can pass in either K
or &K
(or other types implementing ToOwned<Owned = K>
. Calling to_owned
on K
would be a noop and simply pass the value through, while calling to_owned
on &K
would end in a clone.
I don't remember if there's a reason I didn't implement it this way. It may have been an oversight, or caused other issues with types. Will try to take a look tonight and see if I can figure it out.
If these keys will never be removed, and it is acceptable to keep their memory for the lifetime of your binary, I believe there are libraries which can be used to intentionally leak memory, getting you an &'static [u8]
which will not have the lifetime issue. Could be a usable temporary solution.
Thanks for having a look at the issue. However considering I want to avoid copying the keys because of high intensity of requests, I dont think leaking the keys memory forever is a solution for me =)
Why should the Trie itself be parameterized over key type? Currently both having
K
as type parameter and needing it passed as owned toinsert
and (especially) toentry
pretty much forces us to make copy of key for each invocation. I think we only need key when first inserting into the trie, which could be better achieved withToOwned
(btw this comment makes sense for me: https://github.com/sdleffler/qp-trie-rs/blob/91507d5cfee53bafdeeb69deb36140f496a3bbaa/src/node.rs#L245). This way,insert
andentry
would requireBorrow
+ToOwned
and would clone only when really necessary.Then if we stored
[u8]
in some form in node and parameterize functions likeinsert
andentry
themselves which would allow passing them&'a [u8]
with different'a
which is currently impossible.