mozilla / mp4parse-rust

Parser for ISO Base Media Format aka video/mp4 written in Rust.
Mozilla Public License 2.0
406 stars 62 forks source link

Quicktime manufacturer field #335

Closed kornelski closed 3 years ago

kornelski commented 3 years ago

mp4parse expects the 4 bytes after fourcc in hldr to be 0.

The reference to the standard version you've used is paywalled, but a copy I've found says it can be either a string (for a classic MacOS Quicktime "Manufacturer" field) or 0, and I've been putting a string in there.

baumanj commented 3 years ago

On the merits of the specific issue, I agree a change to allow this is in everyone's best interests. I think it would be highly unlikely that either the reserved or pre_defined fields are ever used in the future, especially without bumping the version field of the hldr box. It is frustrating that ISO doesn't provide clearer language about the intended interpretation of foo = 0 in the SDL. I plan to file an issue about that shortly.

While I 100% agree with you that it would be far preferable if all the all the standards that AVIF is based upon were available free-of-charge, I think refusing to purchase the up-to-date versions to inform your development will cause more harm to the ecosystem in the long run without helping our mutual cause to change to the availability of the standards. Not only have there been meaningful changes in the most recent published versions of the standards, but (quite frustratingly) there have been significant normative changes that are only available in draft amendments. See the imir change and the CICP Unspecified interpretation.

Please feel free to contact me directly if acquiring the specs is a problem for you, but my sense based on your comments has been that you are refusing to use non-free resources as a matter of principle. Is that correct?

In any case, I'm hoping we can collaborate more closely in the future to help discover any remaining or future compliance-related issues sooner.

kornelski commented 3 years ago

@baumanj I have two issues with AVIF conformance:

It's not a free format?

I'm having trouble calling AVIF a free and open format if I have to purchase multiple commercial non-public spec documents in order to implement it. In my initial assessment it was all open except MIAF, and I hoped MIAF wouldn't be important. It's still not clear to me much of AVIF spec is freely available, because the ISO website has paywall-looking pages even for specs that are (or were?) freely available.

It would have been easier to just buy the specs, but after realizing how similar ISO is to Elsevier model, I would rather not give them a penny on principle. At this point I'm deliberately avoiding buying the specs to see if it's possible to implement AVIF using only public information. I did not expect it to be so troublesome, and it soured my opinion of AVIF and ISO.

Conformance for sake of conformance

Most of these specs look very old, and were written for many other use-cases than just an image format. I question their relevance and usefulness. AVIF doesn't need to care what RAM-constrained 1990's video encoders needed. I don't think there are any valuable use-cases for processing AVIF files with tools that predate existence of AV1.

I question conformance to these specs for sake of conformance, rather than interoperability. Firefox/Chrome/libavif/cavif-rs have achieved interoperability in their earlier versions, and from then on all the conformance changes have not improved interoperability, but only harmed it. Therefore to me these conformance improvements are negative in value. AVIF is a new image format, and all its implementations are new. Further conformance strictness is not solving anyone's problems. At this point it's only an exercise in reading a spec and implementing things nobody uses, and only adds bloat to files, and bloat to encoders.

As far as I can tell, the backwards-compatibility with HEIF turned out to have zero practical usefulness. The largest deployment of HEIF in iOS and macOS doesn't even recognize that AVIF is an image file.

I'm also worried about the broad feature set of HEIF (or MIAF etc). If browsers start implementing more of the weird features, this will require everyone else to implement them too to display web files properly. But HEIF includes things like lossless cropping which are actively harmful for web images (people will end up accidentally doxxing or sharing nudes of themselves if it will be possible to uncrop images).

baumanj commented 3 years ago

Hey Kornel, thanks for this thoughtful message. Sorry it's taken so long to reply, but as I'm sure you can appreciate, I've been pretty busy with AVIF about to ship (🤞 in Fx93).

I think we actually agree on a great deal of this stuff. As someone without prior experience of ISOBMFF, HEIF or MIAF prior to working on AVIF, it has often been pretty bewildering to even find where something is specified. And I agree that while AVIF is free for use, it's really not possible to create an interoperable implementation of a reader or writer without access to the specifications. In fact, it's occasionally required access to unpublished revisions (see imir, colr). I think that's really not acceptable and it's something we're working to address for the future.

Personally, I would much prefer it if all the specs needed for AVIF implementation were free, but I also recognize that it's possible to work towards that goal, while being pragmatic today about the current reality. All the specifications generated by AOM (AVIF, AV1 bitstream, AV1 ISOBMFF binding) are freely available as is the ISO HEIF specification (though it could certainly be easier to find). That leaves only MIAF and BMFF which must be purchased for CHF 138 and 198 respectively (together about $350 USD, £250, €300 currently). Relative to the value of the labor required to produce just an AVIF writer or reader, I hope you'd agree that's not a prohibitive amount. If there are folks out there who would like to be producing AVIF implementations, but cannot afford these specifications, I would like to know.

At this point I'm deliberately avoiding buying the specs to see if it's possible to implement AVIF using only public information

Would you agree that your experiment has proven that it isn't possible? I agree it's an unfortunate outcome, but at this point, who is benefiting from you continuing to eschew the specifications? It leads to more work for both of us, as well as the other AVIF implementors. It leads to incompatability which negatively affects consumers of AVIF. I don't think it's likely to change ISO's position with regards to charging for these documents, and if your goal includes sending a message to AOM members that basing new technologies on non-free standards results in interoperability heartaches, I'd say that message has been well and clearly received.

Most of these specs look very old, and were written for many other use-cases than just an image format. I question their relevance and usefulness. AVIF doesn't need to care what RAM-constrained 1990's video encoders needed. I don't think there are any valuable use-cases for processing AVIF files with tools that predate existence of AV1.

I agree that the original BMFF is quite old in tech terms, but it's not like it hasn't been actively revised; there was a significant update in 2020, after all. And I agree that it's far more general than an image format, though I believe that is intentional. One can question the decision to base AVIF on BMFF at all, but for the sake of delivering useful code, that ship has sailed, so we all have to live with it if we want to use AVIF.

HEIF and MIAF on the other hand are fairly recent, and pretty image-focused, though I am certainly sympathetic to the argument that they provide too many features, making implementation and interoperability challenging. Again, these requirements are long since decided, so not really germane to the question of how we make the best AVIF implementations we can.

I don't think there are any valuable use-cases for processing AVIF files with tools that predate existence of AV1.

I know that cavif-rs is pure rust—which I love—and doesn't depend on any BMFF or HEIF libraries, but I think for other implementations (including mp4parse), being able to reuse BMFF and HEIF code is quite valuable. Additionally, tools like MP4Box.js, gpac and ComplianceWarden wouldn't exist, or at least not in such robust, useful forms if AVIF had been based on a new container format. I've personally found those valuable in my development, but maybe your experience has been different. Again, it may be that a different container, or a brand new container would've been better, but that's just not where we are.

I question conformance to these specs for sake of conformance, rather than interoperability. Firefox/Chrome/libavif/cavif-rs have achieved interoperability in their earlier versions, and from then on all the conformance changes have not improved interoperability, but only harmed it.

I have to disagree that Firefox/cavif-rs interoperability was better before. There may have been fewer errors on cavif-rs generated input prior to mp4parse and libavif trying to become consistent with regards to compliance, but that was largely because mp4parse's AVIF implementation wasn't very complete yet.

Also, the point of compliance is not to make sure these four pieces of software interoperate, it's to make sure any software has an equal opportunity to interoperate. Certainly the issues regarding the access to up-to-date specifications is an obstacle there, but I don't think our implementations were better for the sake of a broader, interoperable AVIF ecosystem before we focused more on compliance. I think they were much less consistent, meaning that if new writers wanted to work well, they'd have to test across more readers. And for readers, there may be no way to know what software to test. I have no idea if there are more writers being developed that I'm unaware of, but the likelihood more will be developed is high. The only way to do right by them is to try to implement the specification as faithfully as I can. The more consistently we implement the spec, the more likely testing against an independent validator like Compliance Warden will catch the lion's share of issues, making life much easier for writer developers.

Therefore to me these conformance improvements are negative in value. AVIF is a new image format, and all its implementations are new. Further conformance strictness is not solving anyone's problems. At this point it's only an exercise in reading a spec and implementing things nobody uses, and only adds bloat to files, and bloat to encoders.

I think there are two separate issues here: strictness which has a negative impact on the file size or complexity, and strictness which merely narrows the scope of legal files.

I certainly understand your concern about the first. After all, why add bytes when much of the value of replacing older formats with AVIF is saving them? In those cases, I think there is an understandable tension between what the writer intends the files to be used for, and what readers may find value in. Personally, I'm in favor of making things of marginal utility like mif1 branding or explicit pixi boxes optional. You've made a number of good arguments about changes that can be made, and those changes are happening in many places (though probably not as fast as either of us would like). Those are valuable improvements for AVIF generally, and for implementations that come later, but I don't think they would've occurred if not for our work at adhering to the standard and independent implementations interoperating with each other.

In the second case (stuff like the very issue we're commenting on), I think the fact that AVIF is a new format makes it exactly the time to be strict. In fact, it's the only time we can be strict, because as we've identified, once there are millions of auto-generated AVIF in CDN caches, it's too expensive to regenerate them, and we have to live with the results. Fortunately in the cases we've seen so far (missing pixi[1][2], missing ispe, this issue) we've managed to come to good resolutions, generally improving the specification, but that's not guaranteed if we don't look for these things. The lack of consistency in many older formats (HTML, JPEG) makes handling them and writing new code far more complicated than it would be with a clear, consistently implemented spec. How well we do in hewing to the standards will determine whether writing new interoperable AVIF code in the future is a joy or a headache. Is it a bummer that we have to instead experience some of those headaches now for the future benefit of the format? Sure, but I'm hoping it's worth it.

As far as I can tell, the backwards-compatibility with HEIF turned out to have zero practical usefulness. The largest deployment of HEIF in iOS and macOS doesn't even recognize that AVIF is an image file.

The decision was before my involvement, but I don't think leveraging HEIF was intended for instant compatability, so much as reducing the surface area the AVIF spec and allowing for reuse of existing code. I wouldn't expect iOS and macOS to recognize AVIFs as image files until they ship an implementation. Apple has yet to ship an AV1 decoder, so it's not like those videos can be played natively anyway. The only reason they're identified as video files, as far as I understand, is the .mp4 extension. As mentioned previously, I think the tools that work with HEIF also being applicable, or more easily modified, to work with AVIF is a significant benefit. Imagemagick supports AVIF because of libheif. Would you that's useful?

As for the spec, AVIF is only ~15 pages to HEIF's 80+ (and MIAF's ~40 and BMFF's 250+, though much of it irrelevant to images). While not a perfect ancestor—we both agree HEIF has features AVIF may be better without—I'd much prefer that to AVIF needing to reinvent 85% of itself.

I'm also worried about the broad feature set of HEIF (or MIAF etc). If browsers start implementing more of the weird features, this will require everyone else to implement them too to display web files properly.

As one of the people responsible for implementing them, I am likewise concerned. Are there specific ones you'd like to discuss? I know you're not a fan of image sequences and lossless cropping. What else?

Also, if you want to have more influence into these kinds of discussions, you could always encourage Cloudflare to join AOM. It would definitely be valuable to have some CDNs represented. I think it's a unique and valuable perspective.

But HEIF includes things like lossless cropping which are actively harmful for web images (people will end up accidentally doxxing or sharing nudes of themselves if it will be possible to uncrop images).

We're 100% in agreement there, which is why I've been working to add mitigations against such harmful features. That's why it's not currently shipping in Chrome or Firefox and why there is a warning against future restrictions on its use in the AVIF spec. Furthermore, it prompted a more general discussion of privacy considerations in MPEG specs, so I think it's not a perfect system, I do think there's value in it.

This certainly turned into a wall of text. Hopefully there's something useful though. I think we're really pretty close in most of our opinions, so I hope we can be more cooperative going forward. If you want to continue the discussion here or in private communication, I look forward to hearing from you.