mozilla / mp4parse-rust

Parser for ISO Base Media Format aka video/mp4 written in Rust.
Mozilla Public License 2.0
416 stars 62 forks source link

Require `utf8string` types to be null-terminated #308

Open baumanj opened 3 years ago

baumanj commented 3 years ago

Per ISOBMFF (ISO/IEC 14496-12:2020) § 4.2.1, fields of type utf8string are

UTF-8 string as defined in IETF RFC 3629, null-terminated.

Currently this requirement is not enforced in the parsing of:

and possibly others

baumanj commented 3 years ago

See https://github.com/mozilla/mp4parse-rust/pull/309#discussion_r651454804 for a link to code we've previously used to parse these utf8string types.

Given the invalid files in the wild we don't want to break, the approach here will likely be to be stricter* in the context of AVIF, since it's a newer format, but more permissive in other contexts.

Per the comment in old, deleted code:

    // XXX(kinetik): definition of "null-terminated" string is fuzzy, we have:
    // - zero or more byte strings, with a single null terminating the string.
    // - zero byte strings with no null terminator (i.e. zero space in the box for the string)
    // - length-prefixed strings with no null terminator (e.g. bear_rotate_0.mp4)
    // - multiple byte strings where more than one byte is a null.

Based on this text from ISOBMFF (ISO/IEC 14496-12:2020) § 4.2.1:

In these definitions, null-terminated means that the last character of a string is Unicode NUL, and hence an empty string is represented by a single Unicode NUL.

I think zero byte strings with no null terminator are invalid (but we will continue to accept what we've accepted before). It's less clear what to do with multiple byte strings where more than one byte is a null since the spec only requires the last character to be NUL, not that the first NUL is necessarily the last character.

I've filed https://github.com/MPEGGroup/isobmff/issues/27 to request clarification in the spec.

* That is, implement the standard