googlefonts / sfntly

A Library for Using, Editing, and Creating SFNT-based Fonts
451 stars 162 forks source link

Wrong encoding in EOT's 255SHORT for large negative numbers #101

Open behdad opened 6 years ago

behdad commented 6 years ago

Code here: https://github.com/googlei18n/sfntly/blob/master/java/src/com/google/typography/font/tools/conversion/eot/GlyfEncoder.java#L185

The "spec" is here: https://www.w3.org/Submission/MTX/#id_255SHORT

Note how in the spec, a wordCode-encoded short does NOT get a flipSignCode before it. The short is encoded as is in two's-complement.

Eg. to encode the number -984, currently we encode: FA FD 03 D8

FA – flipsignCode (250 decimal) FD – WordCode (253 decimal) 03 D8 – Value (984 decimal)

Correct encoding is FD FC 28 FD – WordCode (253 decimal) FC 28 – Value (-984 decimal)

(Reported to me by Microsoft.)

behdad commented 6 years ago

The correct code should do something like this:


const short lowestCode = 253;

short value;

if(value > (3 * lowestCode) || value <= -lowestCode)

{

      os.write(253);

      os.write((byte)(absValue >> 8));

      os.write((byte)(absValue & 0xff));

}

else

{

  if (value < 0) {

      os.write(250); /* flip the sign */

  value = (short)-value;

  }

  else if (value >= lowestCode) {

    value = (short)(value - lowestCode);

    if (value >= lowestCode) {

      value = (short)(value - lowestCode);

      os.write(254);

    }

    else {

      os.write(255);

    }

  }

  os.write((byte)value);

}
nyshadhr9 commented 6 years ago

There are some values which can be encoded in multiple ways. For Ex: 500 can be encoded as both: FE (oneMoreByteCode2) 00 (decimal 0) or FF(oneMoreByteCode1) FA (decimal 250)

Is there a preference for one or the other? Or are we just concerned about minimizing the number of bytes taken to encode, so either one works.

behdad commented 6 years ago

Yeah I see the overlaps. No idea. @taylorb-monotype can you advise, which way does Monotype encode / recommend encoding these? Thanks.

taylorb-monotype commented 6 years ago

Our code to write 255Shorts is the same as what Behdad has indicated above. With this code I don't think there is any ambiguity. The choice is made by the code above. For example, you have the following encodings:

value encoding 0 00 1 01 ... 249 f9 250 ff 00 251 ff 01 ... 499 ff f9 500 fe 00 501 fe 01 ... 749 fe f9 750 fd 02 ee 751 fd 02 ef ... 767 fd 02 ff 768 fd 03 00 769 fd 03 01 ... etc.

behdad commented 6 years ago

Our code to write 255Shorts is the same as what Behdad has indicated above.

Thanks for confirming.

With this code I don't think there is any ambiguity.

There is ambiguity in the spec, but yeah sticking to the above scheme works.

The choice is made by the code above. For example, you have the following encodings:

value encoding 0 00 1 01 ... 249 f9 250 ff 00 251 ff 01 ... 499 ff f9 500 fe 00 501 fe 01 ... 749 fe f9 750 fd 02 ee 751 fd 02 ef ... 767 fd 02 ff 768 fd 03 00 769 fd 03 01 ... etc.

nyshadhr9 commented 6 years ago

Thank you for clarifying :)

My concern with the above approach was that, 750-755 take up 3 bytes whereas they could be encoded in 2 bytes as FE FA - FE FF. But I guess that can be sacrificed for a simpler and consistent solution.