saucecontrol / Compact-ICC-Profiles

Minimal ICC Profiles intended for embedding in image files
Creative Commons Zero v1.0 Universal
195 stars 7 forks source link

copyright 'mluc' in v4 profiles has a padding byte that violates the spec #16

Closed nico closed 1 year ago

nico commented 1 year ago

This is more a comment than really a bug. It just confused me for a bit. If this is all intentional, feel free to just close it.

% xxd ~/src/Compact-ICC-Profiles/profiles/sRGB-v4.icc | rg -A2 120:
00000120: 6d6c 7563 0000 0000 0000 0001 0000 000c  mluc............
00000130: 656e 5553 0000 0006 0000 001c 0043 0043  enUS.........C.C
00000140: 0030 0021 5859 5a20 0000 0000 0000 f6d6  .0.!XYZ ........

The first row is 'mluc', the reserved 0 u32, the number of records u32 (1), and the record size u32 (12), so far so good.

The second row is 2 + 2 bytes for enUS, then an u32 "string size in bytes" (6), then an u32 string data offset, then 4 utf-16 characters 'CC0!'. It's 4 character since tags must be 4-byte aligned, so that's the smallest-possible text.

However, the "string size in bytes" is 6, not 8, so tools think the text is "CC0". That might be intentional to match the v2 profiles, where cprt is stored in a 'text', which must be nul-terminated, so there's only room for "CC0".

And indeed, further up there's

00000090: 6370 7274 0000 0120 0000 0022 7774 7074  cprt... ..."wtpt

which says that the cprt data is only 0x22 bytes long – it ends after the '0030'. So the "0021" is padding – but if it's padding, why isn't it "0000" (or "2121")?

saucecontrol commented 1 year ago

I kind of love that you're looking at the profile byte by byte. Short answer is the 0021 is intended to be padding, and I chose to pad with the UTF-16BE value of ! simply because bangs are more interesting than nulls.

I honestly had forgotten I did that when assembling the profile, which I did manually in a hex editor. I copied that tag into all my v4 profiles, so they've all got it.

According to the ICC v4 spec, section 7.1.2 d) all pad bytes shall be NULL (as defined in ISO/IEC 646, character 0/0), so this was strictly a violation of the spec. The reason for the restriction is given in a note that follows:

the likelihood of two profiles which contain the same tag data, yet have different checksum values, is reduced

Certainly allowing pad bytes to contain random/uninitialized data would be against the spirit of the rule, but I figured a single bang wouldn't do any harm 😇

nico commented 1 year ago

I discovered it because I wrote some code that reads and re-serializes an ICC profile. My code uses null bytes for padding, so my profile was different to yours in that one byte, and due to that, in the profile ID. I used your profile as test input and then was confused for a bit that the output didn't match the input.

Thanks for pointing out that the spec requires null padding bytes, I hadn't seen that yet. Maybe it'd be nice to update your profiles to be spec-compliant then :)

saucecontrol commented 1 year ago

Yeah, honestly I must have missed that note in the spec my first time through it. There was no such requirement in the v2 spec. Certainly if I had known these profiles would get as much use as they have, I would have gone with plain null padding regardless.

It seems reasonable to update the profiles given the spec is clear about the padding rules.

saucecontrol commented 1 year ago

All v4 profiles have been updated with null padding bytes in https://github.com/saucecontrol/Compact-ICC-Profiles/commit/ca0a34b4047e3ebdefc0196759a4d88b85b104b4