gpac / ComplianceWarden

A pluggable compliance checker (ISOBMFF, HEIF/MIAF/AVIF, AV1 HDR10+)
https://gpac.github.io/ComplianceWarden-wasm/
Other
16 stars 7 forks source link

False negative: `HandlerBox` (`hdlr`) `name` field shall not contain NUL bytes prior to terminator #33

Open baumanj opened 2 years ago

baumanj commented 2 years ago

Now that https://github.com/MPEGGroup/FileFormat/issues/35 has clarified the specification, there should be an error for inputs like this one (from https://github.com/AOMediaCodec/av1-avif/blob/master/testFiles/Apple/multilayer_examples/animals_00_multilayer_lsel.avif):

00 00 00 22 // Box size (34 bytes)
68 64 6c 72 // boxtype ("hdlr")
00 00 00 00 // version (1 byte), flags (3 bytes)
00 00 00 00 // pre_defined = 0
70 69 63 74 // handler_type ("pict")
00 00 00 00 // reserved[0] = 0
00 00 00 00 // reserved[1] = 0
00 00 00 00 // reserved[2] = 0
00 00       // name ("\0\0")

Here is a minimal example: invalid-avif-hdlr-name-multiple-nul.avif.zip

See also https://github.com/gpac/ComplianceWarden/issues/28

leo-barnes commented 2 years ago

@baumanj Good catch on the files, I'll need to look into why they ended up like this.

I'm not sure I agree that this is non-spec compliant though (although it might be good to output a warning). Since name is a NULL terminated string, it will end at the first \0. I don't think there is anything in the spec disallowing extra data to be in a box. If there is, then I agree that this is non-compliant. But if there isn't it's just extra unused cruft in the box. (Still not ideal of course.)

If name was something like lib\0avif\0, I would expect the avif\0 to simply be ignored.

baumanj commented 2 years ago

I don't think there is anything in the spec disallowing extra data to be in a box

That's a good question. If you agree with https://github.com/MPEGGroup/FileFormat/issues/35 that the utf8string ends at the first NUL, I'd tend to think that there would need to be explicit allowance in the spec for extra data in the box. Especially considering hldr is a FullBox, I can't think of a valid purpose for extra unspecified data at the end. The definition of BoxHeader in ISO/IEC 14496-12:2020 § 4.2.1 "Object syntax conventions" reads:

size is an integer that specifies the number of bytes in this box, including all its fields and contained boxes

which doesn't seem to allow for extra bytes or else I imagine it would be mentioned there.

More generally, this concept seems to be addressed by the expandable keyword from ISO/IEC 14496-1:2010 § 8.3.3 "Expandable classes", and the expandable keyword is not present in the definition of HandlerBox, so I'd call this invalid.

@cconcolato: do you have any thoughts?

cconcolato commented 2 years ago

Interesting question. The ability to have 'garbage' at the end of a box (FullBox or not) is actually a feature to enable backwards compatible extensions, especially for Boxes but also for FullBoxes when you don't want to introduce a new version. I think it has been used in the past (can't recall for the moment). But it could be clarified in the ISOBMFF spec that readers are expected to consume and ignore those bytes.

@dwsinger opinion?

dwsinger commented 2 years ago

yes, perhaps we should have advice that readers should process the fields that they recognize, and if there is extra data in a box after that, ignore it (presuming that they are processing in accordance with a brand). However, it raises an interesting question for file processors: should they copy that data over, or delete it?

cconcolato commented 2 years ago

I raised an issue on the MPEG repo.

Now regarding ComplianceWarden, I think it could emit a warning if a box contains data past the last known field.

rbouqueau commented 1 year ago

Still waiting for clarification in https://github.com/MPEGGroup/FileFormat/issues/46.

dwsinger commented 1 year ago

OK, two things confused here. (a) should a reader parse what it recognizes and ignore the rest? (b) are you allowed to put garbage after the last parsed field when creating a file?

IMHO, the answers are interlinked; to (a) yes, as that enables backward-compatible extensions to the spec, adding fields, without breaking old readers. (b) no, because those extra bytes would be interpreted by those hypothetical future spec-compatible readers as belonging to the newly defined next field.

So, in my retired-hence-unprofessional opinion, these files are invalid. They contain data after the last byte that would clash with any extension.

jeanlf commented 1 year ago

@dwsinger , I'll be a bit more carefull when saying

no, because those extra bytes would be interpreted by those hypothetical future spec-compatible readers as belonging to the newly defined next field.

I agree that if extensions are defined without versioning, this would break everything. However, if full box versioning is properly used, any "garbage bytes" at the end of a v0 only would not be problematic to those hypothetical future spec-compatible readers since they should ignore v1+.

So theoretically speaking, I don't think these files are invalid (but there is a lot of should and hypothetical involved I agree...)

dwsinger commented 1 year ago

I was going to say that's fair, if fields are added then the version should be bumped, and no-one should be trying to parse them under v0.

But we don't bump the version if the change is a pure backwards-compatible addition; readers (I think) are told not to try parsing boxes when they don't understand the version number, as the change of version may indicate a non-backwards-compatible change. So making a box v1 would break v0 compatibility.

So we are back to the general question: do extra bytes aka garbage at the end of the box make a file formally non-compliant? I think so.

rbouqueau commented 1 year ago

Any news here?