gpac / ComplianceWarden

A pluggable compliance checker (ISOBMFF, HEIF/MIAF/AVIF, AV1 HDR10+)
https://gpac.github.io/ComplianceWarden-wasm/
Other
16 stars 7 forks source link

Add j2k1 to list of visualSampleEntryFourccs #75

Closed dukesook closed 7 months ago

dukesook commented 8 months ago

Add support for JPEG 2000 in HEIF.

Sample image: here

rbouqueau commented 8 months ago

LGTM. What should we do with the sample? Do you want me to strip it and add it to the regression testsuite?

NB: seeing the license for the content is here

dukesook commented 8 months ago

Yes, J2K in HEIF sample image to the testsuite would be great. I'm guessing there won't be any licensing issues with the above sample image, but I'll go ahead and create a similar one just to be on the safe side.

dukesook commented 7 months ago

Another sample J2K in HEIF is here.

I created the image myself so there are no licensing restrictions of any kind tied to it.

rbouqueau commented 7 months ago

Please find attached the equivalent file in nasm syntax attached. It is too big as-is to be included in the test suite.

valid-J2K.tar.bz2.zip

TODO:

HTH, ping me if needed.

dukesook commented 7 months ago

I was able to extract and convert between heif and assembly with nasm. It looks like the original heif file is much smaller than .asm file.

Why are these assembly files used instead of the originals? Editing assembly seems very tedious and difficult so I hope there's a strong reason for storing test files in this fashion.

rbouqueau commented 7 months ago

I was able to extract and convert between heif and assembly with nasm.

FYI I had done it for you: valid-J2K.tar.bz2.zip

It looks like the original heif file is much smaller than .asm file.

Yes, textual representations of binary files are ~20x bigger. However text also compresses quite well, which reduces the delta (in git).

We strip any non-useful feature. We've also reduced drastically the mdat box size. The content in itself is usually not useful for ComplianceWarden.

The key point is to understand that these test vectors are not intended to be valid samples. We may want to add valid samples to the tests though (e.g. retrieving files and testing them). Ideas welcome.

Other tools (e.g. MP4Box -diso) already provides some deep view of what's in the file.

Why are these assembly files used instead of the originals?

I don't think I know of any generic editable (and commentable) format for binary files. So going to text was the natural way. But we also needed to go binary in a snap, so we choose assembly.

At the time of creating ComplianceWarden, Kaitai was too incomplete to parse MPEG-TS or CENC (which were potentially in the scope). We discarded it.

Editing assembly seems very tedious

mdat_start:
    dd BE(mdat_end - mdat_start)
    dd "mdat"
     ; obu(0) 
    db 0x0A ; forbidden(1) obu_type(4) obu_extension_flag(1) obu_has_size_field(1) obu_reserved_1bit(1) 
    db 0x0F ; leb128_byte(8) 
mdat_end:

When the assembly is annotated (with symbols' names, num_bits, and values) I find it quite easy to edit. Maybe that's because I manipulate bits, hex, and bitstreams more than most people. Let me know.

storing test files in this fashion.

We also have faulty files in the unit tests. Dealing with faulty files in binary (non-editable and non-commentable) would be an issue IMHO.

dukesook commented 7 months ago

Great, thank you for such a detailed answer. It might be worth documenting some of the information above for future users of the Compliance Warden.

Removing the mdat shrinks valid-J2K.asm to ~8 KB. I'll look into getting it to be as small as possible.

Did you use NASM to convert the sample HEIF to the valid-J2K.asm file?

rbouqueau commented 7 months ago

Did you use NASM to used to convert the sample HEIF to the valid-J2K.asm file?

Nope. I used a custom mp4 disassembler I had previously made to build custom files.

I will add some info to the README.

dukesook commented 7 months ago

I used nasm to convert the valid heif .asm files back into their .heic format. The MP4Box.js tool was only able to parse some of them correctly.

Is this expected behavior? Or does it suggest a bug in the .asm files? Even though the valid files aren't displayable, I expected them to be parsable.

rbouqueau commented 7 months ago

If the original parses correctly, then the stripped version should also parse correctly. Usually it is an issue with offsets that point to removed data. You can send the files to me via email if you want, I'll have a look.

dukesook commented 7 months ago

Okay, I added a minimal valid-J2K.asm to this PR.

ISO/IEC15444-16 requires:

rbouqueau commented 7 months ago

You also need to commit the valid-J2K.ref for comparison. If you make a clean clone of the repo, you'll see it missing.

dukesook commented 7 months ago

Okay, I added the valid-J2K.ref.

rbouqueau commented 7 months ago

Merging. Thanks!