whatwg / compression

Compression Standard
https://compression.spec.whatwg.org/
Other
82 stars 21 forks source link

Handling of trailing data after deflate-raw final block #47

Closed fhanau closed 1 year ago

fhanau commented 1 year ago

Following up on the discussion in https://github.com/WICG/compression/pull/43#discussion_r822297948, the current wording of how to handle deflate-raw data does not state how to handle data after the end of a raw deflate stream (i.e. blocks or arbitrary data following a block with the BFINAL flag.) The requirement that non-conforming blocks must be treated as errors could be interpreted as not allowing any trailing data, which would match the requirements for gzip and non-raw deflate as well as what zlib is doing, but this is not obvious from the specification.

ricea commented 1 year ago

Can you think of a good way to phrase it? How does this sound?

  • CompressionStream must not emit any data after the BFINAL block. DecompressionStream must consider any data after the BFINAL block to be an error.

Ideally we would have an option to control this, as concatenating multiple streams seems to be quite common in practice.

fhanau commented 1 year ago

Thank you for the quick response! Based on how RFC 1951 defines BFINAL (BFINAL is set if and only if this is the last block of the data set.), it feels clear to me that a compliant CompressionStream should never produce data after a BFINAL block. I would align the language with the other formats, perhaps It is an error for DecompressionStream if there is additional input data after the final block as indicated by the BFINAL flag.

Maybe I'm forgetting something, but I haven't encountered concatenated deflate streams much. One use case of not producing an error if there is trailing data would be to decompress a deflate stream within a file and then continue to process the following data, but that would require the position in the input data after the stream to be available, which is not part of the current API. For now, I think it's best if the same behavior (i.e. not allowing trailing data) is applied to all formats.

ricea commented 1 year ago

Thanks. I will make a PR unless you get there first.

If I understand correctly, it was popular to send multiple concatenated streams in HTTP responses when people didn't have a proper streaming compressor available. I don't know how much that is a problem nowadays. They wouldn't have been raw streams anyway.

fhanau commented 1 year ago

Thanks for explaining! I have created a PR. On a related note, it looks like the explainer doesn't mention deflate-raw yet, so that should probably be changed too.