Open lifthrasiir opened 3 years ago
Thanks for the very detailed analysis.
I do have a work-in-progress shuffling of opcodes and coordinate encodings that is essentially 16/16/16 absolute L/Q/C
. That seems to pack well enough (total size of the Material Design icon suite is roughly the same before/after), especially after introducing Parallelogram/Rectangle
and Ellipse/Circle
operations (alluded to in a #4 comment). These P/E
opcodes aren't in the SVG path model, but the first one captures a lot of the HVhv
axis-aligned rectangles and the second one captures a lot of arc-ops that would otherwise be 4 C
instructions (12 pairs of coordinates = 24 numbers).
If that's probably good enough, I'm still tempted to go with absolute coordinates only. I'm not sure if the Z
bit you're proposing, or relative ops in general, are worth it anymore. Again, compactness is a goal but not compactness at any cost.
That makes sense. As noted above, runs of 3 or 4 consecutive H/V commands are also very common, only seconded by runs of a single H/V command. Many of them will directly correspond to P/R operations (whatever it is). I think the relative encoding is still a big win for a small complexity increase, but if P/E operations prove to be efficient enough it can go away.
As per #29, it seems that we limit ourselves to lines, quadratic Bezier and cubic Bezier curves. This would change the coordinates distribution and thus gives an opportunity to simplify and optimize the coordinates encoding.
Concrete Analysis
To get a gist of the typical usage, I've analyzed the Material Design Icons as of Templarian/MaterialDesign-SVG@63c5cb306073a9ecdfd3579f0f696746ab6305f6 (which also includes the Google's icon set). A quick and dirty Python script used is available here.
This table shows the number of operations and (run length - 1) in bits. So 0 corresponds to a run of 1, 1 to a run of 2, 2 to runs of 3--4, 3 to runs of 5--8 and so on. All shapes have been normalized to one of M, L, C, Q and Z; arcs in particular have been converted to cubic Bezier curves with the same algorithm as the current C version of IconVG.
It seems that there are significant gaps between 2--3 bits and 4--5 bits. This is of course a characteristic of this particular icon set, but it seems that we can safely limit the max run length to 16 with a minimal size increase (<0.3% in this case, theoretically 6.25% max).
The "line" row is a subset of L where only perpendicular lines are allowed. This was to see if having separate
V
orH
opcodes with a run is desirable; it seems not, given that a vast majority is a run of just one. It however shows that most lines are perpendicular, suggesting a per-coordinate flag.This table shows the frequencies of coordinates that can be safely represented in one-byte absolute or relative encoding. I've made tons of assumptions:
No further quantization has been performed. If the number is not close enough (±0.01) to the nearest integer it was assumed to not fit in one byte at all.
The icon set has a fixed view box of (0,0)--(24,24). This means that absolute integral coordinates almost always fit in 5 bits. To be fair I think this small view box does model the post-quantization distribution. I've also tested with every coordinate multiplied by 2--10, 16 and 100; the entire table is too large to include here though.
There is no relative encoding with a single bit, since it will entirely consist of a sign bit. For more bits 2's complement representation was assumed.
The relative encoding always refers to the last coordinates. This means that, say, if there are 2 coordinates per one operation then the second coordinates are encoded relative to the first coordinates. This is different from SVG but seems reasonable especially given that we no longer has arcs (which center points significantly deviate from the path). Just to be sure I've also tested with SVG-like reference coordinates and found them to be much worse.
The initial coordinates used for the relative encoding are (0,0). There might be a better default or even a general way to maximally use available one-byte space, but such improvements cannot harm this baseline encoding.
We don't care about the exact encoding (would it be 2 bytes quantized or 4 bytes IEEE 754?) if the number doesn't fit to one byte in any encoding. I do think it should be quantized to 2 bytes in that case.
There are two kinds of encodings imaginable with this table.
In addition to this, it is possible that coordinates are multiplied by some scaling factor, possibly differently for absolute and relative encodings. The shared scaling factor models the optimization performed by authoring tools, so it doesn't need to be a part of the encoding itself. The ratio between absolute and relative encodings would have to be fixed or somehow encoded however.
I've simulated both encodings with all possible parameters and collected the best and runner-ups within 10% of the best (plus a selected few indicated with
*
). The first number is the number of required multi-byte encodings, so lower is better.At least for this particular icon set, both encodings seem to perform relatively well, especially with scaling. The first kind of encodings does leave additional opcode space though so they are preferred. It also seems that 2x and 10x scaling factors are common, where the latter is perhaps an artifact from the decimal encoding of SVG.
It should be also noted that there are disproportionally many coordinates that are exactly equal to the previous (i.e.
rel=0
). This is, of course, the original rationale of H/V opcodes. Any efficient encoding should take care of that.Concrete Proposal
There would be five more sets of opcodes in the combined styling-drawing mode:
L
opcodes, corresponding to 1--16 line segments.C
opcodes, corresponding to 1--16 cubic Bezier segments (thus 3x coordinates).Q
opcodes, corresponding to 1--16 quadratic Bezier segments (thus 2x coordinates).Z
opcode, closing the current path. If there is no current path this is a nop.ZM
opcode, closing the current path and move to the following coordinates.Alternatively, if the opcode space is sufficient the following opcodes might be possible as well:
L
,C
andQ
opcodes as above.ML
,MC
andMQ
opcodes. They have an additional pair of coordinates to move before other coordinates.Z
opcode.In any case coordinates always come in pairs, in the order of x then y. Each coordinate is encoded as follows.
One-byte absolute encodes an absolute integer from -32 to 31. The actual number encoded is (integer + 32).
One-byte relative encodes an integral difference from -16 to 15. The actual number encoded is (difference + 16). The reference coordinate for x is always the previous x decoded, even in the previous operations or operations with multiple coordinates pairs (unlike SVG path); same for y. The reference coordinates for the first coordinates are (0, 0).
Two-byte absolute encodes an absolute number from -64 to (64 - 1/64) in little endian. The actual number encoded is (number + 64) * 64.
Four-byte absolute encodes an IEEE 754 binary32 number in little endian. The encoded number has only 20 bits of coded mantissa (compared to 23 in the original); the lowest 3 bits are fixed to zero, allowing a direct conversion from the wire format to the number (#33). I personally don't want this at all, but the format's flexible view box demands this unfortunate addition.
A flag bit Z is in the lowest position of the decoded integer and, if set, indicates that next coordinates use a compressed relative zero and only one of two coordinates are encoded. Z is assumed to be unset in the four-byte absolute encoding. (As mentioned above, I expect four-byte encoding to be very uncommon anyway.)
In the initial state, Z in the x coordinate (hereafter ZX) and Z in the y coordinate (hereafter ZY) are interpreted as follows:
In the state where the only single coordinate is encoded, Z is interpreted as follows:
It is invalid that the very last coordinates in the run have any Z bit set. (This can be changed to affect the next run for optimization, but I'm not sure it's worth.)
It is equally valid to make this in the other direction, Z=1 indicating previous coordinates had a zero. This however was more difficult to specify and performed slightly worse than the current proposal (70.5% vs. 70.0% of the original).