Right now retina::codec::VideoFrame implements Buf and supports only H.264. It always has two chunks: an AVC length prefix and the header+body of the (single) picture NAL (note I'll need to extend it for multiple NALs). The idea was to support zero-copy, but I think right now it's kind of the worst of all worlds:
if the NAL is fragmented (most of the bytes and maybe even most of the NALs on a typical stream), it copies it all into a BytesMut with a guesstimated size. It might copy again to grow partway through, and then probably (I haven't looked at profiles) copies it again on BytesMut::freeze due to tokio-rs/bytes#401.
if the caller wants a single contiguous buffer, they'll end up copying it themselves. (Maybe with bytes::Buf::copy_to_bytes, maybe not.)
you can only iterate through it once, which might be a problem for some usages.
AudioFrame and MessageFrame provide a Bytes directly; there's no good reason all three frame bytes shouldn't present their data in the same way.
I'd prefer to follow one of two paths. I haven't made up my mind as to which:
truly support zero-copy via data(&self) -> impl Buf<'self>. VideoFrame needs a Vec of chunks. If you want to iterate multiple times, you can just call data as many times as you want. You can use Buf::chunks_vectored + std::io::Write::write_vectored / tokio::io::AsyncWriteExt::write_vectored.
put everything into a single buffer. Accumulate Bytes in a reused Vec in Depacketizer::push then concatenate them during Depacketizer::pull when the total size is known, avoiding the extra copies mentioned above.
Arguments in favor of zero-copy (custom Buf implementation with multiple chunks):
People like zero-copy; it's generally assumed to be more efficient (but see below).
One neat API trick this would allow (for H.264) is selecting the Annex B encoding (00 00 00 01 between NALs) or the AVC encoding (length prefix between NALs) via something like h264(&self) -> Option<&H264Frame> then data_as_annexb(&self) -> impl Buf or data_as_avc(&self, len_prefix_size: usize) -> impl Buf. With the single-buffer approach I'd probably make folks choose when setting up the depacketizer instead. (I guess it'd also be fairly efficient to have a &mut accessor on the VideoFrame which does a one-way conversion from 4-byte AVC to Annex B, but that's a weird API.) I can imagine someone doing some fan-out thing where they actually want both encodings.
Arguments in favor of a single Bytes or Vec<u8>:
More convenient / simpler to get right IMHO. The zero-copy APIs seem half-baked/finicky, eg tokio's tokio::io::AsyncWriteExt::write_buf just writes the first chunk, and there's no write_all_vectored in either std::io::Write or tokio::io::AsyncWriteExt.
It might actually be more efficient. The answer isn't obvious to me. With my IP cameras, the IDR frames can be half a megabyte, fragmented across hundreds of RTP packets of 1400ish bytes. Just the &[Bytes] is then tens of kilobytes (four pointers per element). Far too big for the no-alloc path of SmallVec<[Bytes; N]>. And if someone's doing a writev call later, they have to set up/iterate through hundreds of IoSlices to write the whole thing at once. (And you can't even reuse a Vec<IoSlice> between iterations without trickery because it expects to operate on a mutable slice of initialized IoSlice objects.) It wouldn't surprise me if zero-copy is actually slower.
I'm not sure but I think Bytes is mostly just a tokio thing. async std folks might use just Vec<u8> or something instead. Although I'll likely keep using Bytes internally anyway to keep the individual packets around between push and pull.
Right now I'm leaning toward single Bytes. I might try benchmarking both but if the performance is close I think simplicity should win.
Right now
retina::codec::VideoFrame
implementsBuf
and supports only H.264. It always has two chunks: an AVC length prefix and the header+body of the (single) picture NAL (note I'll need to extend it for multiple NALs). The idea was to support zero-copy, but I think right now it's kind of the worst of all worlds:BytesMut
with a guesstimated size. It might copy again to grow partway through, and then probably (I haven't looked at profiles) copies it again onBytesMut::freeze
due to tokio-rs/bytes#401.AudioFrame
andMessageFrame
provide aBytes
directly; there's no good reason all three frame bytes shouldn't present their data in the same way.I'd prefer to follow one of two paths. I haven't made up my mind as to which:
data(&self) -> impl Buf<'self>
.VideoFrame
needs aVec
of chunks. If you want to iterate multiple times, you can just calldata
as many times as you want. You can useBuf::chunks_vectored
+std::io::Write::write_vectored
/tokio::io::AsyncWriteExt::write_vectored
.Bytes
in a reusedVec
inDepacketizer::push
then concatenate them duringDepacketizer::pull
when the total size is known, avoiding the extra copies mentioned above.Arguments in favor of zero-copy (custom
Buf
implementation with multiple chunks):00 00 00 01
between NALs) or the AVC encoding (length prefix between NALs) via something likeh264(&self) -> Option<&H264Frame>
thendata_as_annexb(&self) -> impl Buf
ordata_as_avc(&self, len_prefix_size: usize) -> impl Buf
. With the single-buffer approach I'd probably make folks choose when setting up the depacketizer instead. (I guess it'd also be fairly efficient to have a&mut
accessor on theVideoFrame
which does a one-way conversion from 4-byte AVC to Annex B, but that's a weird API.) I can imagine someone doing some fan-out thing where they actually want both encodings.Arguments in favor of a single
Bytes
orVec<u8>
:tokio::io::AsyncWriteExt::write_buf
just writes the first chunk, and there's nowrite_all_vectored
in eitherstd::io::Write
ortokio::io::AsyncWriteExt
.&[Bytes]
is then tens of kilobytes (four pointers per element). Far too big for the no-alloc path ofSmallVec<[Bytes; N]>
. And if someone's doing awritev
call later, they have to set up/iterate through hundreds ofIoSlice
s to write the whole thing at once. (And you can't even reuse aVec<IoSlice>
between iterations without trickery because it expects to operate on a mutable slice of initializedIoSlice
objects.) It wouldn't surprise me if zero-copy is actually slower.Bytes
is mostly just a tokio thing. async std folks might use justVec<u8>
or something instead. Although I'll likely keep usingBytes
internally anyway to keep the individual packets around betweenpush
andpull
.Right now I'm leaning toward single
Bytes
. I might try benchmarking both but if the performance is close I think simplicity should win.