google / iconvg

IconVG is a compact, binary format for simple vector graphics: icons, logos, glyphs and emoji.
Apache License 2.0
676 stars 11 forks source link

Design: Coordinates Encoding #34

Open lifthrasiir opened 3 years ago

lifthrasiir commented 3 years ago

As per #29, it seems that we limit ourselves to lines, quadratic Bezier and cubic Bezier curves. This would change the coordinates distribution and thus gives an opportunity to simplify and optimize the coordinates encoding.

Concrete Analysis

To get a gist of the typical usage, I've analyzed the Material Design Icons as of Templarian/MaterialDesign-SVG@63c5cb306073a9ecdfd3579f0f696746ab6305f6 (which also includes the Google's icon set). A quick and dirty Python script used is available here.

               0       1       2       3       4       5       6
        --------------------------------------------------------
    M |    22073       2       0       0       0       0       0 |    22075
    L |    16774    3547   10394    3325    1668     154      18 |    35880
    C |    17939    4987    5525     873     221      46       1 |    29592
    Q |        9      66      13      28      19       0       0 |      135
    Z |     5614       0       0       0       0       0       0 |     5614
        ---------------------------------------------------------+---------
 line |    18816    3221    8243    1572     701      55       9 |    32617
        ---------------------------------------------------------+---------
           62409    8602   15932    4226    1908     200      19 |    93296

This table shows the number of operations and (run length - 1) in bits. So 0 corresponds to a run of 1, 1 to a run of 2, 2 to runs of 3--4, 3 to runs of 5--8 and so on. All shapes have been normalized to one of M, L, C, Q and Z; arcs in particular have been converted to cubic Bezier curves with the same algorithm as the current C version of IconVG.

It seems that there are significant gaps between 2--3 bits and 4--5 bits. This is of course a characteristic of this particular icon set, but it seems that we can safely limit the max run length to 16 with a minimal size increase (<0.3% in this case, theoretically 6.25% max).

The "line" row is a subset of L where only perpendicular lines are allowed. This was to see if having separate V or H opcodes with a run is desirable; it seems not, given that a vast majority is a run of just one. It however shows that most lines are perpendicular, suggesting a per-coordinate flag.

              rel=0   rel=2   rel=3   rel=4   rel=5   rel=6  no rel
           --------------------------------------------------------
   abs=0 |      477       2      51      32      14      10     167 |      753
   abs=1 |     2068     164     283     174      90      68     737 |     3584
   abs=2 |    17969     516    3415    1192    1035     933    6386 |    31446
   abs=3 |    26993    1290    6440    5666    4693    1025   13364 |    59471
   abs=4 |    53465    3088   15459   12100    7933       4   29965 |   122014
   abs=5 |    55161    2272   10986    5645    7161    4601   23959 |   109785
   abs=6 |        0       0       0       0       0       1       0 |        1
  no abs |    91903    2418    2467    1134     740      39  174819 |   273520
           ---------------------------------------------------------+---------
             248036    9750   39101   25943   21666    6681  249397 |   600574

This table shows the frequencies of coordinates that can be safely represented in one-byte absolute or relative encoding. I've made tons of assumptions:

There are two kinds of encodings imaginable with this table.

  1. Use a separate flag, allocate A bits to absolute and B bits to relative. This would use 1+max(A,B) bits; the asymmetric A != B cases can be used to stuff other encodings.
  2. Allocate A bits to absolute, but switches to relative if the encoded number is less than 2^B. This would use A bits; both the absolute-only encoding (B = 0) and the relative-only encoding (B = A) are included.

In addition to this, it is possible that coordinates are multiplied by some scaling factor, possibly differently for absolute and relative encodings. The shared scaling factor models the optimization performed by authoring tools, so it doesn't need to be a part of the encoding itself. The ratio between absolute and relative encodings would have to be fixed or somehow encoded however.

I've simulated both encodings with all possible parameters and collected the best and runner-ups within 10% of the best (plus a selected few indicated with *). The first number is the number of required multi-byte encodings, so lower is better.

4 BITS:
236518   4 bits 1x absolute
246139   4 bits 1x absolute, 2 bits reinterpreted as 1x relative
253685   4 bits 1x absolute, 3 bits reinterpreted as 1x relative
268101*  1 bit flag, 3 bits 1x absolute or 3 bits 1x relative

5 BITS:
181894   5 bits 1x absolute
193787   5 bits 1x absolute, 2 bits reinterpreted as 1x relative
211320*  1 bit flag, 4 bits 1x absolute or 4 bits 1x relative

6 BITS:
161368   6 bits 2x absolute, 2 bits reinterpreted as 1x relative
161855   6 bits 2x absolute
169863   6 bits 2x absolute, 3 bits reinterpreted as 1x relative
174859   1 bit flag, 5 bits 1x absolute or 5 bits 1x relative
175599   1 bit flag, 5 bits 1x absolute or 4 bits 1x relative
176733   1 bit flag, 5 bits 1x absolute or 3 bits 1x relative
181893*  6 bits 1x absolute

7 BITS:
156161   7 bits 4x absolute, 3 bits reinterpreted as 1x relative
156467   7 bits 4x absolute, 2 bits reinterpreted as 1x relative
157162   1 bit flag, 6 bits 2x absolute or 6 bits 1x relative
157174   1 bit flag, 6 bits 2x absolute or 5 bits 1x relative
157506   1 bit flag, 6 bits 2x absolute or 4 bits 1x relative
158129   1 bit flag, 6 bits 2x absolute or 3 bits 1x relative
158312   7 bits 4x absolute
159624   1 bit flag, 6 bits 2x absolute or 2 bits 1x relative
161367   7 bits 2x absolute, 2 bits reinterpreted as 1x relative
161579   1 bit flag, 6 bits 2x absolute or 0 bits 1x relative
161854   7 bits 2x absolute
164619   7 bits 4x absolute, 4 bits reinterpreted as 1x relative
169862   7 bits 2x absolute, 3 bits reinterpreted as 1x relative
174819*  1 bit flag, 5 bits 1x absolute or 6 bits 1x relative
181893*  7 bits 1x absolute

8 BITS:
129256   8 bits 10x absolute, 3 bits reinterpreted as 1x relative
129903   8 bits 10x absolute, 4 bits reinterpreted as 1x relative
130264   8 bits 10x absolute, 2 bits reinterpreted as 1x relative
131482   8 bits 10x absolute
138738   8 bits 10x absolute, 5 bits reinterpreted as 1x relative
152645*  1 bit flag, 7 bits 4x absolute or 7 bits 100x relative
174819*  1 bit flag, 5 bits 1x absolute or 7 bits 1x relative
181893*  8 bits 1x absolute

At least for this particular icon set, both encodings seem to perform relatively well, especially with scaling. The first kind of encodings does leave additional opcode space though so they are preferred. It also seems that 2x and 10x scaling factors are common, where the latter is perhaps an artifact from the decimal encoding of SVG.

It should be also noted that there are disproportionally many coordinates that are exactly equal to the previous (i.e. rel=0). This is, of course, the original rationale of H/V opcodes. Any efficient encoding should take care of that.

Concrete Proposal

There would be five more sets of opcodes in the combined styling-drawing mode:

Alternatively, if the opcode space is sufficient the following opcodes might be possible as well:

In any case coordinates always come in pairs, in the order of x then y. Each coordinate is encoded as follows.

<- LSB     MSB ->
+-+-+-----------+
|1|Z|   number  | one-byte absolute
+-+-+-----------+

+-+-+-+---------+
|0|1|Z|  delta  | one-byte relative
+-+-+-+---------+

+-+-+-+-+-------+---------------+
|0|0|1|Z|         number        | two-byte absolute
+-+-+-+-+-------+---------------+

+-+-+-+---------+---------------+-------------+-+-------------+-+
|0|0|0|                mantissa               |    exponent   |S| four-byte absolute
+-+-+-+---------+---------------+-------------+-+-------------+-+

One-byte absolute encodes an absolute integer from -32 to 31. The actual number encoded is (integer + 32).

One-byte relative encodes an integral difference from -16 to 15. The actual number encoded is (difference + 16). The reference coordinate for x is always the previous x decoded, even in the previous operations or operations with multiple coordinates pairs (unlike SVG path); same for y. The reference coordinates for the first coordinates are (0, 0).

Note: This particular encoding corresponds to "1 bit flag, 6 bits 2x absolute or 4 bits 1x relative" above. (Due to the scaling difference, the specified encoding of "1 bit flag, 6 bits 2x absolute or 5 bits 2x relative" is no worse than it.)

Note: I've also experimented with an opportunistic absolute encoding, where if the absolute range and relative range completely overlaps the relative range is reinterpreted as a distinct absolute range. This sadly had no effect for this particular icon set. Moreover reference coordinates have to be an exact integer for this to work, and it is very hard to ensure that this is repeatable. It is possible, but the resulting complexity didn't justify itself.

Two-byte absolute encodes an absolute number from -64 to (64 - 1/64) in little endian. The actual number encoded is (number + 64) * 64.

Four-byte absolute encodes an IEEE 754 binary32 number in little endian. The encoded number has only 20 bits of coded mantissa (compared to 23 in the original); the lowest 3 bits are fixed to zero, allowing a direct conversion from the wire format to the number (#33). I personally don't want this at all, but the format's flexible view box demands this unfortunate addition.

A flag bit Z is in the lowest position of the decoded integer and, if set, indicates that next coordinates use a compressed relative zero and only one of two coordinates are encoded. Z is assumed to be unset in the four-byte absolute encoding. (As mentioned above, I expect four-byte encoding to be very uncommon anyway.)

In the initial state, Z in the x coordinate (hereafter ZX) and Z in the y coordinate (hereafter ZY) are interpreted as follows:

In the state where the only single coordinate is encoded, Z is interpreted as follows:

It is invalid that the very last coordinates in the run have any Z bit set. (This can be changed to affect the next run for optimization, but I'm not sure it's worth.)

It is equally valid to make this in the other direction, Z=1 indicating previous coordinates had a zero. This however was more difficult to specify and performed slightly worse than the current proposal (70.5% vs. 70.0% of the original).

nigeltao commented 3 years ago

Thanks for the very detailed analysis.

I do have a work-in-progress shuffling of opcodes and coordinate encodings that is essentially 16/16/16 absolute L/Q/C. That seems to pack well enough (total size of the Material Design icon suite is roughly the same before/after), especially after introducing Parallelogram/Rectangle and Ellipse/Circle operations (alluded to in a #4 comment). These P/E opcodes aren't in the SVG path model, but the first one captures a lot of the HVhv axis-aligned rectangles and the second one captures a lot of arc-ops that would otherwise be 4 C instructions (12 pairs of coordinates = 24 numbers).

If that's probably good enough, I'm still tempted to go with absolute coordinates only. I'm not sure if the Z bit you're proposing, or relative ops in general, are worth it anymore. Again, compactness is a goal but not compactness at any cost.

lifthrasiir commented 3 years ago

That makes sense. As noted above, runs of 3 or 4 consecutive H/V commands are also very common, only seconded by runs of a single H/V command. Many of them will directly correspond to P/R operations (whatever it is). I think the relative encoding is still a big win for a small complexity increase, but if P/E operations prove to be efficient enough it can go away.