khonsulabs / bonsaidb

A developer-friendly document database that grows with you, written in Rust
https://bonsaidb.io/
Apache License 2.0
998 stars 37 forks source link

Cow tools #277

Closed asonix closed 1 year ago

asonix commented 1 year ago

Swapping &[u8] in from_ord_bytes for Cow<'_, [u8]> leads to some unfortunate performance issues when there's escaping done early. Each nested call needs to allocate to propagate the Owned case.

I have an idea of a way to avoid this overhead, but I need to test some things first

fixes: https://github.com/khonsulabs/bonsaidb/issues/254

asonix commented 1 year ago

How would you feel about a TriCow: https://git.asonix.dog/asonix/cow-tools/src/branch/main/src/lib.rs

ecton commented 1 year ago

I'm not opposed to using a different type than Cow, but I'm not sure how it will help much with recursive from_ord_bytes implementations. Also, thank you for taking a stab at this!

asonix commented 1 year ago

The idea with TriCow is it includes a third borrowed variant TriCow::Ephemeral(&'b T), which is not bound by lifetime 'a like Borrowed(&'a T). This allows forwarding borrows that don't originate from outside the call stack to nested invocations. It still must be turned into an Owned value at the deepest level in order to return it back out, however, it means that each level that does a split or other manipulation on the data no longer needs to re-allocate to forward those slices deeper

Down in the mod tests I have an example similar to Key<'a> called WithCow<'a> that demonstrates how Ephemeral might be used to reduce allocations

ecton commented 1 year ago

I think you're onto something, but one issue is that currently, decoding an &str always works, but with the change to accept a Cow, if the caller provides an owned value, it can't be decoded. This is further complicated by the BonsaiDb only being able to sometimes provide an owned value and sometimes it needing to provide a borrowed version. I'm not necessarily opposed to forcing borrowing implementations to use a Cow-like type to decode so that they can accept ownership -- I just need to think this is a net effect that's worth the change in what's supported.

I think I'll want something a bit more specialized though -- a type that specifically is for Vec. The reasoning being that I would like for T and (T,) and Option<T> and Result<T> to all be able to use owned paths, but it'll require supporting modifying the owned Vec before extracting it. I'll reopen the other issue and look at it again soon.

ecton commented 1 year ago

I apparently don't know how to push changes on top of your PR, but here's the net changes I've made so far. I'm still not quite sure this complexity is actually worth it, but I think this is close to being complete. The reason I question whether it's worth it is that bonsaidb-local doesn't always call the decode functions with owned values, nor does it always call with borrowed values. The total reduction in allocations doesn't seem that high, and the added complexity is quite significant.

asonix commented 1 year ago

I'm not particularly tied to this specific implementation. If you think it's more complicated than it's worth then we should consider other options. You're right that this approach doesn't eliminate many allocations, and it is mostly helpful for deeply-nested Key types which are unlikely to be common.

ecton commented 1 year ago

Thank you again for your help on this. I've merged the changes in today.

In the end, I decided that most users will not implement Key, and will usually just derive it or use a type that already implements it. As such, the complexity is hidden away from 99.9% of who I expect users to be. So, even if it is more complex, this does make things more efficient, and that makes it worth doing.