google / iconvg

IconVG is a compact, binary format for simple vector graphics: icons, logos, glyphs and emoji.
Apache License 2.0
674 stars 11 forks source link

Design: Do we need color registers at all? #31

Open lifthrasiir opened 3 years ago

lifthrasiir commented 3 years ago

So well, I've tried to implement IconVG as an experiment and noticed a lot of things that hasn't been yet mentioned elsewhere. I found the discussion in #4 helpful (and I agree to @Hixie that this can and should be made much simpler with an explicit set of goals in mind) but too broad in my humble opinion. So I will try to give a series of bite-sized feedbacks (others to come) that should be actionable at your discretion.

It is understandatable that IconVG is fundamentally a series of commands given its goals, but the use of registers is unusual. It has a large number (5+) of different encodings for colors which includes a pseudo-operation (blend) and a clever partial encoding of gradients that refer to other registers. I presume this design is a result of these considerations:

Still, the resulting design feels simultaneously too complex and yet unsatisfactory to me.

If the compactness is a goal, redundant encodings should be avoided. If the simplicity is a goal, the whole gradient and blend business is absurd. The current design is a hodge-podge of two somewhat conflicting goals.

Concrete (Overlong) Shower Thought

It occurred to me that:

In my current proposal, styling opcodes are repurposed as follows:

0x00 .. 0x3f    c = CREG[opcode]; CREG[CSEL++] = c
0x40            CREG[CSEL-1] = blend CREG[CSEL-2] and CREG[CSEL-1] with subsequent byte
0x41            LOD0 = NREG[NSEL-2]; LOD1 = NREG[NSEL-1]; NSEL -= 2
0x42            CREG[CSEL++] = make gradient reference out of next three bytes:
                (NSTOPS - 2) + gradient shape * 128, CBASE + spread * 64, NBASE
0x43            start drawing with CREG[--CSEL]

0x80 .. 0x8f    CREG[CSEL++] = 3-byte color (32⁴ quantization)
0x90 .. 0xf3    CREG[CSEL++] = 4-byte color, 4x2 most significant bits encoded in opcode
0xf4            CREG[subsequent byte (should be < 64)] = CREG[--CSEL]

0xfb            NREG[subsequent byte] = NREG[--NSEL]
0xfc            NREG[NSEL++] = 1-byte natural number - 256
0xfd            NREG[NSEL++] = 1-byte natural number
0xfe            NREG[NSEL++] = 2-byte natural number / 128 - 256
0xff            NREG[NSEL++] = 4-byte IEEE 754 binary32

There are still 64 color "registers" and 256 number "registers" (up from 64), but they are mostly used as stack elements. Since they are defined in terms of registers pushing more than 64/256 elements would overwrite the bottom of stacks; this is intentional.

Stack references are absolute, counting from the bottom. This makes referencing the same color over and over refers to the same index. This does assume that each command statically "knows" current selectors; the addition of functions will require some thought (e.g. selectors can "rotate" on the function call).

Since it is possible to refer to elements beyond CREG[CSEL] and NREG[NSEL], they can be filled with good defaults. The suggested palette of size N can go to CREG[64-N] .. CREG[63] for example. Palette opcodes are removed for this reason.

The encoding of 4-byte color plus 1-byte opcode amazingly fits to 4 bytes. This is possible because we use premultiplied colors; there are only 1³ + 2³ + 3³ + 4³ = 100 possible combinations of 4 sets of most significant two bits. So we can pack remaining 4x6 bits to 3 bytes and then make use of remaining 156 combinations for other things. There is also a simpler alternative encoding relying on the MSB of alpha: 32 bits long if MSB is set (store alpha as opcode), 28 bits long if MSB is not set (assign 16 opcodes and pack to opcode + 3 bytes).

I decided to remove one-byte (5³) quantization and expand two-byte (16⁴) quantization to make it 20 bits long (32⁴). It is still possible to add more quantizations, but the bulk of IconVG data consists of coordinates and not colors, so I don't think it's worth. Since the shortest color opcode was already 2 bytes long, this doesn't make much difference anyway.

Blend is now a separate opcode, consuming one of two operands (c0 c1 -- c0 blend(c0,c1)). I think it is likely that one operand is fixed and another is changing, so making the operation asymmetric makes sense.

The switch to the drawing pops the topmost color. Since we can draw multiple paths in one sitting, the color is likely not reused. In the rare case that the color has to be reused (e.g. LOD changes) it takes just one more byte.

I kept the gradient reference to allow some stops of the gradient to be changed without making a new reference. (I once considered to make three different opcodes, but that will make functions less useful.) The actual encoding (like, alpha=0 and blue>=128) however is opaque to the content author. The opcode argument mostly follows the original 3-byte "invalid" color, but the gradient shape has to be moved because NBASE is now 8 bits long.

I frankly feel additional number opcodes hardly matter since we are not using numbers that much. For now I've specifically tuned to the matrix usage (0xfc .. 0xfe for c and f; 0xff for others). Maybe though we should remove all number operands except for binary32 and add them back according to the observed operand distribution.

That's all. Any thoughts would be appreciated.

Hixie commented 3 years ago

I suspect custom palettes are the main use cases for a lot of the features you question (e.g. blends).

nigeltao commented 3 years ago

Gradients are practically limited to 58 stops,

Indeed, but is that really a problem? Are there real world examples of icons with 60-stop gradients?

If the compactness is a goal, redundant encodings should be avoided. If the simplicity is a goal, the whole gradient and blend business is absurd. The current design is a hodge-podge of two somewhat conflicting goals.

Compactness is a goal, but not the only goal. The aim isn't compactness at any cost. Ditto for simplicity. Design in the presence of conflicting goals involves trade-offs, which I guess some might see as a hodge-podge.

But, yes, as @Hixie said, blends are needed for themable icons. The one IconVG file can produce multiple renderings based on user-configurable theme colors. favicon.png shows the default blue rendering of favicon.ivg. favicon.pink.png shows a different rendering of the same favicon.ivg. It uses a themable color (bright pink) but it also uses a blend of that color: dark pink = blend(bright pink, black). This particular .ivg file uses one theme-color but there are up to 64 theme-color slots. You might want to vary sports icons with two or three of a teams' colors: go blue/yellow team!

So we need 3 byte (indirect) colors, or some equivalent, to express blends of the custom palette (the theme). And we need 4 byte colors because RGBA is 4 bytes. Sure, you could pack the 1,082,146,816 premultiplied RGBA colors a little tighter, but it's probably not worth it. Compactness isn't the only goal.

We could probably drop the 2 byte and 3 byte (direct) colors. They were really only a space optimization when opcode space is plentiful. But as you said, "the bulk of IconVG data consists of coordinates and not colors", so the optimization doesn't really do much. 1 byte colors are also possibly redundant if they're a 0%/100% blend of two other colors. But they're kind of specified anyway, if you have 3 byte (indirect) colors, so they're possibly worth keeping.

The 5³ quantization within a 1 byte color, that was mostly an opportunistic repurpose. After 64 creg colors and 64 palette/theme colors, we have 128 values that would otherwise be unused, and the 5³ quantization seems as good as any other filler values.

The other thing with #29 is that, with far fewer drawing 'verbs' (LineTo, QuadTo, CubeTo) then we might not need the complexity of two modes (styling and drawing). Collapsing to a single mode, though, means that opcode space is no longer as plentiful, and I'm less inclined to be clever about color encoding by pushing bits from "extra argument bytes" to "the first (opcode) byte", like your 0x90 .. 0xf3 CREG[CSEL++] = 4-byte color, 4x2 most significant bits encoded in opcode suggestion.

Interesting ideas, though. Thanks for sharing.

The switch to the drawing [mode] pops the topmost color

That's also an interesting idea. Let me think about it.

lifthrasiir commented 3 years ago

Indeed, but is that really a problem? Are there real world examples of icons with 60-stop gradients?

I agree this is an edge case, but people do try to abuse gradients.

Example: In a certain infamous Korean wiki you can put linear gradients to the table cell background; I've downloaded the database dump and the largest number of stops was 187. Sure, it is repeating so it can be technically reduced down to 46. But it does show that 64 stops are hardly unimaginable.

You can always subdivide the shape so it is more of the authoring problem for linear gradients (radial gradients have a different story), but having an unusual failure case is not very desirable either. I would simply limit NSTOPS to [2, 58].

Compactness is a goal, but not the only goal. The aim isn't compactness at any cost. Ditto for simplicity. Design in the presence of conflicting goals involves trade-offs, which I guess some might see as a hodge-podge.

I'm glad to hear this, but trade-offs still need some measures of preference (priorities, relative weights and so on). The very reason I've tried the compact encoding is that I didn't know how much trade-off you are willing to allow, so I made most opcodes no longer than before to be safe. I guess it was not necessary.

But, yes, as Hixie said, blends are needed for themable icons.

Ah, I do agree that blends are needed. I instead objected to the current way of treating blends as a kind of operands instead of a generic operation. Same to the gradient encoding.

Collapsing to a single mode, though, means that opcode space is no longer as plentiful, [...]

Yeah, I tried to leave a huge gap in the proposed opcode map for that reason, but ~0x40 opcodes might be still insufficient. I have another idea with this in mind; will post when it's ready.