Open shahn opened 2 months ago
Thank you for your issue! However we already do support Bytes, we just have it aliased as OctetString
.
Hi XAMPPRocky, thanks for your quick reply! I am not sure if maybe there's a misunderstanding: It seems you're using the OctetString as an alias for bytes::Bytes
, but not to decode from/encode into, but rather to support the ASN1 data type. As far as I see, right now one would decode from a &[u8] and encode into a Vecbytes
crate for that.
I have a working branch which introduces new lifetime type for Encoder
and allows reusing the same buffer recursively quite much with the new AnyEncoder
associated type, without Rc
or RefCell
. It works with plain &mut Vec<u8>
type, for OER
and maybe for PER
too with .view_bits_mut::<Msb0>();
(haven't tested yet).
From performance wise, it would also make sense to introduce encode_buf
method since it can more than half the required allocations if the same buffer can be reused. However, I am not sure if this type could be Bytes
, at least internally, since it is much slower. Generic type internally would also make things complicated. Would there be some other approach?
Interesting, do you know why Bytes
is so much slower? Perhaps that is something that also would be of interest to the tokio folks, since I assume they care about Bytes
performance a lot
This might a bit unfair comparision since VecBytes
type. Bytes
type has an advantage that you can split or slice the buffer without any new allocations (just by getting new "view" to the same buffer), so if you need to do that often, it might be faster in the end. It uses reference counters, and internal structure is more complicated, so just adding data to the buffer is slower in general, and the tradeoff comes here. It also allows shared ownership and zero-copy cloning.
There are actually few cases where I haven't tested yet where that splitting could be valuable and maybe the performance difference is not that clear. In general, there are many cases where you need to store the encoded values in separate buffer before you can calculate the length, and splitting can be useful if it saves some allocations or moving/cloning. But even if it would be possible to use Bytes
in OER, for example , PER
has a high dependency of BitVec so that change is unlikely and would require major rework and encode_buf
should work somewhat similarly for all codecs.
Of course encode_buf
could be also implemented "inefficiently" so that the caller provides the buffer but the codec does not use it internally and just moves the data in there in the end.
I mean, I think the last option is not really what I was after, the goal was to reduce allocations for sure :) I use Bytes a lot for receiving data from the network and then working with it, to avoid allocations (I had some trouble with memory fragmentation in a different project, so kind of defaulting to it). But even providing a Vec which can be re-used would be helpful for that.
Thanks for educating me on the intricacies of different encodings, I am pretty new to ASN.1.
Just a note, OER has now encode_buf
for &mut Vec<u8>
type for starters. There is internally second buffer for length calculations but other than that, allocations are now almost at theoretical minimum. Traits should support adding the same for other codecs too.
I guess a lot of people are working with the bytes crate when using tokio and network protocols. It would be nice if rasn could work directly with the relevant types, with an API similar to https://docs.rs/tokio/latest/tokio/io/trait.AsyncReadExt.html#method.read_buf (so instead of
encode
you'd callencode_buf
), to allow re-using buffers.