Open Wumpf opened 4 weeks ago
Does arrow bottom out in https://crates.io/crates/bytes ? That would be useful for storing RC-counted references across crates.
Otherwise we have to consider a few different alternatives:
re_mp4
uses bytes: Range<usize>
instead of Vec<u8>
re_mp4
has some sort of trait ByteSource
with a Box<dyn ByteSource>
that we then implement for Blob
I think the best path forward is to change re_mp4
to store byte ranges (Range<usize>
) rather than the raw bytes, and then have re_video
(or re_renderer
) have some sort of trait ByteSource { fn read(&self, bytes: Range<usize>) -> Option<Vec<u8>> }
that we then can wrap around a Blob
.
It should be less than a days worth of work.
The vast majority of the time used by Mp4::read
is allocating memory for each sample (see https://github.com/rerun-io/re_mp4/pull/11). If we instead store byte ranges, all that goes away. This mean this issue is also about vastly optimizing video load times (which can easily be hundreds of milliseconds, atm).
Even after https://github.com/rerun-io/rerun/pull/7860 we still copy video data for each sample. But Blob
is a Arc<Vec<u8>>
internally, so we should be able to avoid that as well. The decoder could receive a sub-slice of the Blob instead of copying the data to its own Vec
. For web, it's possible to give the JS video APIs a direct reference to Wasm's linear memory, using the unsafe Uint8Array::view
, which means constructing the EncodedVideoChunk
will not require a copy either.
Right now we effectively hold an mp4 blob at least twice: Raw in the (arrow) store and then again upon parsing in our mp4 object. All larger pieces should just be pointers back to the raw blob in order to conserve memory.
Also general cleanup of mp4 crate is in order, this is a good opportunity.
This won't fly for streaming mp4 easily, but for full mp4 it should be doable to do indices into the blob.