Open baumanj opened 3 years ago
See https://github.com/mozilla/mp4parse-rust/pull/309#discussion_r651454804 for a link to code we've previously used to parse these utf8string
types.
Given the invalid files in the wild we don't want to break, the approach here will likely be to be stricter* in the context of AVIF, since it's a newer format, but more permissive in other contexts.
Per the comment in old, deleted code:
// XXX(kinetik): definition of "null-terminated" string is fuzzy, we have:
// - zero or more byte strings, with a single null terminating the string.
// - zero byte strings with no null terminator (i.e. zero space in the box for the string)
// - length-prefixed strings with no null terminator (e.g. bear_rotate_0.mp4)
// - multiple byte strings where more than one byte is a null.
Based on this text from ISOBMFF (ISO/IEC 14496-12:2020) § 4.2.1:
In these definitions, null-terminated means that the last character of a string is Unicode NUL, and hence an empty string is represented by a single Unicode NUL.
I think zero byte strings with no null terminator
are invalid (but we will continue to accept what we've accepted before). It's less clear what to do with multiple byte strings where more than one byte is a null
since the spec only requires the last character to be NUL, not that the first NUL is necessarily the last character.
I've filed https://github.com/MPEGGroup/isobmff/issues/27 to request clarification in the spec.
* That is, implement the standard
Per ISOBMFF (ISO/IEC 14496-12:2020) § 4.2.1, fields of type
utf8string
areCurrently this requirement is not enforced in the parsing of:
HandlerBox
(hdlr
)ItemInfoEntry
(infe
)SchemeTypeBox
(schm
)and possibly others