Optimize mp4 parse times, and avoid duplicating their RAM use

rerun-io / rerun

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

https://rerun.io/

Apache License 2.0

6.43k stars 312 forks source link

Optimize mp4 parse times, and avoid duplicating their RAM use #7481

Open Wumpf opened 4 weeks ago

Wumpf commented 4 weeks ago

Right now we effectively hold an mp4 blob at least twice: Raw in the (arrow) store and then again upon parsing in our mp4 object. All larger pieces should just be pointers back to the raw blob in order to conserve memory.

Also general cleanup of mp4 crate is in order, this is a good opportunity.

This won't fly for streaming mp4 easily, but for full mp4 it should be doable to do indices into the blob.

emilk commented 4 weeks ago

Does arrow bottom out in https://crates.io/crates/bytes ? That would be useful for storing RC-counted references across crates.

Otherwise we have to consider a few different alternatives:

re_mp4 uses bytes: Range<usize> instead of Vec<u8>
re_mp4 has some sort of trait ByteSource with a Box<dyn ByteSource> that we then implement for Blob
???

emilk commented 2 weeks ago

I think the best path forward is to change re_mp4 to store byte ranges (Range<usize>) rather than the raw bytes, and then have re_video (or re_renderer) have some sort of trait ByteSource { fn read(&self, bytes: Range<usize>) -> Option<Vec<u8>> } that we then can wrap around a Blob.

It should be less than a days worth of work.

emilk commented 1 day ago

The vast majority of the time used by Mp4::read is allocating memory for each sample (see https://github.com/rerun-io/re_mp4/pull/11). If we instead store byte ranges, all that goes away. This mean this issue is also about vastly optimizing video load times (which can easily be hundreds of milliseconds, atm).

Related: https://github.com/rerun-io/rerun/pull/7850

jprochazk commented 5 hours ago

Even after https://github.com/rerun-io/rerun/pull/7860 we still copy video data for each sample. But Blob is a Arc<Vec<u8>> internally, so we should be able to avoid that as well. The decoder could receive a sub-slice of the Blob instead of copying the data to its own Vec. For web, it's possible to give the JS video APIs a direct reference to Wasm's linear memory, using the unsafe Uint8Array::view, which means constructing the EncodedVideoChunk will not require a copy either.