add feature to interop with `bytes` crate, add `encode_buf` and `decode_buf`?

shahn commented 2 months ago

I guess a lot of people are working with the bytes crate when using tokio and network protocols. It would be nice if rasn could work directly with the relevant types, with an API similar to https://docs.rs/tokio/latest/tokio/io/trait.AsyncReadExt.html#method.read_buf (so instead of encode you'd call encode_buf), to allow re-using buffers.

XAMPPRocky commented 2 months ago

Thank you for your issue! However we already do support Bytes, we just have it aliased as OctetString.

shahn commented 2 months ago

Hi XAMPPRocky, thanks for your quick reply! I am not sure if maybe there's a misunderstanding: It seems you're using the OctetString as an alias for bytes::Bytes, but not to decode from/encode into, but rather to support the ASN1 data type. As far as I see, right now one would decode from a &[u8] and encode into a Vec, rather than using the bytes crate for that.

Nicceboy commented 4 weeks ago

I have a working branch which introduces new lifetime type for Encoder and allows reusing the same buffer recursively quite much with the new AnyEncoder associated type, without Rc or RefCell. It works with plain &mut Vec<u8> type, for OER and maybe for PER too with .view_bits_mut::<Msb0>(); (haven't tested yet).

From performance wise, it would also make sense to introduce encode_buf method since it can more than half the required allocations if the same buffer can be reused. However, I am not sure if this type could be Bytes, at least internally, since it is much slower. Generic type internally would also make things complicated. Would there be some other approach?

shahn commented 4 weeks ago

Interesting, do you know why Bytes is so much slower? Perhaps that is something that also would be of interest to the tokio folks, since I assume they care about Bytes performance a lot

Nicceboy commented 4 weeks ago

This might a bit unfair comparision since Vec might be as fast as it gets, if you don't need some features of Bytes type. Bytes type has an advantage that you can split or slice the buffer without any new allocations (just by getting new "view" to the same buffer), so if you need to do that often, it might be faster in the end. It uses reference counters, and internal structure is more complicated, so just adding data to the buffer is slower in general, and the tradeoff comes here. It also allows shared ownership and zero-copy cloning.

There are actually few cases where I haven't tested yet where that splitting could be valuable and maybe the performance difference is not that clear. In general, there are many cases where you need to store the encoded values in separate buffer before you can calculate the length, and splitting can be useful if it saves some allocations or moving/cloning. But even if it would be possible to use Bytes in OER, for example , PER has a high dependency of BitVec so that change is unlikely and would require major rework and encode_buf should work somewhat similarly for all codecs.

Of course encode_buf could be also implemented "inefficiently" so that the caller provides the buffer but the codec does not use it internally and just moves the data in there in the end.

shahn commented 4 weeks ago

I mean, I think the last option is not really what I was after, the goal was to reduce allocations for sure :) I use Bytes a lot for receiving data from the network and then working with it, to avoid allocations (I had some trouble with memory fragmentation in a different project, so kind of defaulting to it). But even providing a Vec which can be re-used would be helpful for that.

Thanks for educating me on the intricacies of different encodings, I am pretty new to ASN.1.

Nicceboy commented 1 day ago

Just a note, OER has now encode_buf for &mut Vec<u8> type for starters. There is internally second buffer for length calculations but other than that, allocations are now almost at theoretical minimum. Traits should support adding the same for other codecs too.

librasn / rasn

add feature to interop with `bytes` crate, add `encode_buf` and `decode_buf`? #322