signalapp / mp4san

A Rust MP4 format sanitizer
MIT License
10 stars 4 forks source link

Add new callback-less low-level API #3

Closed jessa0 closed 1 year ago

jessa0 commented 1 year ago

This PR adds an API to the sanitizer which has no callbacks or blocking, to accommodate libsignal's FFI bridge.

Libsignal's FFI bridge uses a system of conventions and Rust macros to expose objects and methods on those objects to Swift, Java, and Node. There are are small number of calls back into Swift, Java, and Node from Rust, which are bridged by hand using function pointers, which seems like we want to avoid doing for mp4san if possible as it seems error-prone. The current mp4san API on master contains callbacks in the form of the std::io::Read and std::io::Seek traits, so this PR aims to eliminate those by instead 1) allowing the caller to feed data chunks in directly, and 2) requesting the caller to perform seek operations in its return value, SanitizerState. This is only meant to serve as a low-level API for the FFI bridge. The highel-level API exposed from the libsignal Java lib, for example, would look more like:

public class Sanitizer {
    public Sanitizer(InputStream input) { ... }
    public SanitizedMetadata sanitize() { ... }
}

However, most the complexity anyway in this PR lies in tracking "skipped" input (i.e. input which we want to allow the caller to skip or seek over since we don't need it), which we would have to implement in any case in order to facilitate sanitizing a streaming input, which happens for example in some methods of sending media through Signal on Android. This is encapsulated in the buffer module. Currently only mdat and free boxes are signaled to the caller to be skipped over.

A notable alternative to the SanitizerState API and having to track skipped bytes would be to make mp4san::sanitize() async, but that would require us to heavily modify or re-implement the mp4 crate, as it is currently only synchronous.

jessa0 commented 1 year ago

PTAL if ya wanna, @michaelkirk

jessa0 commented 1 year ago

To help me understand better, can you clarify what you expect the entry point into this module to be for integrators? Or is there an example integration somewhere already?

Ah, I meant to add an example to the documentation, but forgot. libsignal is going to use this library via the Sanitizer methods, and will look something like how the new mp4san::sanitize() function is implemented, but in Java/Swift. Users who can just pass an io::Read + io::Seek can just use mp4san::sanitize() directly.

michaelkirk commented 1 year ago

can you clarify what you expect the entry point into this module to be for integrators?

[ ...]

libsignal is going to use this library via the Sanitizer methods, and will look something like how the new mp4san::sanitize() function is implemented, but in Java/Swift.

Ok yes, that was my confusion - so mp4san::sanitize() is sort of vestigial at this point, in that no client will use it directly, but probably it's helpful to keep it as an example for integrators and for testing.

jessa0 commented 1 year ago

A notable alternative to the SanitizerState API and having to track skipped bytes would be to make mp4san::sanitize() async, but that would require us to heavily modify or re-implement the mp4 crate, as it is currently only synchronous.

This thought might be worth revisiting, in light of #4. Though going async might end up making #5 more complicated?

jessa0 commented 1 year ago

I'm gonna go ahead and drop this PR, as dropping the mp4 crate (see issues #4 / #5) means we don't have to implement the complicated buffer and "skipped input tracking" logic. I also believe, after going through the experiment of writing this PR, that we'd be better off implementing the callbacks in libsignal than implementing a callback-less API.