googlefonts / colr-gradients-spec

63 stars 8 forks source link

Distinguish ItemVariationStore formats (add format 1) due to change to wordDeltaCount field #339

Closed PeterConstable closed 3 years ago

PeterConstable commented 3 years ago

@anthrotype suggested that the change in ItemVariationData of treating the wordDeltaCount field as a packed field with a new meaning assigned to the high-order bit should result in a format bump to ItemVariationStore.

It does seem that should be done:

Currently, the high-order bit of wordDeltaCount should never be set since no existing font would define that many regions (which are roughly proportional to design masters), and in OT 1.8.4 the VariationRegionList.regionCount field was restricted to <32K. That entails that the number of logical entries in each delta set must be <32K. But it probably isn't safe to assume that implementations would validate the ItemVariationData.wordDeltaCount field is, accordingly, <32K.

If existing implementations read a format 0 ItemVariationStore in which the high-order bit of wordDeltaCount is set, they may assume there are >32K in16 entries in each delta set. That would likely result in wrong interpretation of the data, but could also result in read overrun past the end of the ItemVariationStore (and potential security vulnerability).

Separate formats 0 and 1 can be defined: they would have the same fields, but in format 0 the LONG_WORDS flag is not used.

anthrotype commented 3 years ago

the current format is 1 so the proposed new format would be format 2

https://docs.microsoft.com/en-us/typography/opentype/spec/otvarcommonformats#item-variation-store-header-and-item-variation-data-subtables

justvanrossum commented 3 years ago

Separate formats 0 and 1 can be defined: they would have the same fields, but in format 0 the LONG_WORDS flag is not used.

This feels a bit redundant: couldn't the new format be for "long words" only, and the old format for "short words" only?

anthrotype commented 3 years ago

the wordDeltaCount field (whose high bit becomes the LONG_WORDS flag) is defined on the ItemVariationData subtable, not on the parent ItemVariationStore table, where the Format field is specified. A Format 2 VarStore can contain a mix of VarData subtables, some of which use shorts, other use longs, depending on the LONG_WORDS flag. A compiler would use format 2 whenever there's any VarData subtable that contains long deltas, or the current format 1 when none do.

justvanrossum commented 3 years ago

Ah, somehow I missed that completely. Sorry for the noise.

anthrotype commented 3 years ago

a Format 2 VarStore containing no "long" VarData is arguably useless but still valid

behdad commented 3 years ago

No.

PeterConstable commented 3 years ago

No.

@behdad Please clarify: What are you responding to?

rsheeter commented 3 years ago

Adding formats comes at implementation cost so it's best to default to not doing so where we don't have to. Here it seems perhaps we don't have to.

This change was, sinces it's conception, argued to be safe, no need to change format. Per f2f w/@behdad I believe the argument is that word count has to be less than region count and a legacy parser will see a bad relationship between those fields and stop.

On that basis I'm inclined to suggest we close w/o making changes.

PeterConstable commented 3 years ago

There's an assumption that compilers will never generate an IVS for an HVAR, VAR, MVAR, BASE or GDEF table in which the high-order bit of the wordDeltaCount field is set.

Since it can be assumed that region count will always be <32K, the only reason for setting that bit would be to allow for 32-bit deltas, and such are never required for HVAR... GDEF tables.

But...

It would be possible for a compiler to set the wordDeltaCount to 0x8000, meaning that the delta set has zero long-format deltas, but all the short-format deltas are 16-bit. That might be plausible for some HVAR etc. table, and certainly workable in any except that an existing HVAR etc. parser might interpret that as >32K columns, contrary to the region count.

So, there's also an assumption that every existing HVAR, etc. parser will not attempt to parse any ItemVariationData subtable without validating wordDeltaCount against regionIndexCount.

The latter may well be a safe assumption. We should be fairly confident about that, though, since otherwise we may be setting up a security vulnerability.

Even if we don't make an IVS format distinction, it feels like something additional should be said, e.g., that compilers should only ever set the LONG_WORDS flag in tables that actually have 32-bit values that can be variable (currently only COLR).

PeterConstable commented 3 years ago

We already had wording, the number of deltas that use the long ("word") representation... must be less than or equal to regionIndexCount. So that partially mitigates the issue. In #342 I've proposed an additional recommendation relevant for compilers as a further mitigation.

PeterConstable commented 3 years ago

Fixed by #342

behdad commented 3 years ago

We already had wording, the number of deltas that use the long ("word") representation... must be less than or equal to regionIndexCount.

That was all considered when this was designed:

https://github.com/googlefonts/colr-gradients-spec/issues/29#issuecomment-679415436

You were part of that discussion. You just didn't remember, so bunch of our times had to be wasted to stop you from jumping to pen...