Should the spec require copying init.data when constructing EncodedAudioChunk and EncodedVideoChunk?

w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.

https://w3c.github.io/webcodecs/

Other

987 stars 136 forks source link

Should the spec require copying init.data when constructing EncodedAudioChunk and EncodedVideoChunk? #127

Closed OllieJones closed 3 years ago

OllieJones commented 3 years ago

Sections 9.1.1 and 9.2.1 of the draft WebCodecs spec say:

Assign a copy of init.data to chunk.data.

I suggest the words a copy of be removed from the spec. It seems like leaving those words in requires unnecessary copying of compressed bitstream data.

Leaving these words out would allow a client of these constructors to make the decision about copying that data. It's possible for a client to pass a .subarray() of some ArrayBuffer. If that ArrayBuffer is volatile for some reason, it should be up to the client to copy the data.

Of course, if a decoder implementation requires the compressed bitstream data to be copied, the implementation can do so. But promising to do so adds a required copying step.

How about this wording?

Assign init.data to chunk.data. The client must not allow init.data to change after using this ctor.

sandersdan commented 3 years ago

The goal here is to reduce the total number of copies. We can't just use an ArrayBuffer directly as it could be mutated, and if a decoder is currently reading from it that could result in TOCTTOU issues.

@chcunningham previously we discussed removing data and providing copyInto() instead, thus making encoded chunks immutable.

An alternative to immutable chunks is to have our decoders always copy the data before decoding it, but I don't really see an advantage in having mutable chunks.

chcunningham commented 3 years ago

We can't just use an ArrayBuffer directly as it could be mutated, and if a decoder is currently reading from it that could result in TOCTTOU issues.

One option under consideration is to transfer the provided ArrayBuffer when constructing the encoded chunk. This would mitigate TOCTTOU issues by detaching the users reference to the data.

@chcunningham previously we discussed removing data and providing copyInto() instead, thus making encoded chunks immutable.

I think this is probably the best we can do without read-only ArrayBuffers (https://github.com/WICG/reducing-memory-copies/issues/1). It does impose a new copy for users of the encoder interface (e.g. presently they could just pass our data directly to WebTransport).

An alternative to immutable chunks is to have our decoders always copy the data before decoding it, but I don't really see an advantage in having mutable chunks.

I lean toward immutable. The alternative doesn't resolve the issue #80.

chcunningham commented 3 years ago

I'm closing this as I think its work is tracked in duplicate issues. The plan is:

Make Encoded*Chunks immutable. Tracked by #80. This avoids the confusion described in that issue and allows us to eliminate one of the copies described above as we feed the chunk into the decoder.
Allow callers to optionally transfer ownership of their ArrayBuffer's when constructing an Encoded*Chunk. Tracked by #104. This obviously eliminates the copy (assuming the site didn't need to make its own copy to continue to have access to the data).