Open nigeltao opened 3 years ago
I would discourage the use of version numbers. They prevent a format from being forward-compatible. Better to define error handling behaviour for all possible error conditions (unknown op codes, etc) and then add features in a backwards-compatible manner, IMHO.
(Breaking with FFV0 is fine, I'm just saying to avoid FFV2 being incompatible with FFV1. Consider for example how animated GIFs fall back to non-animated GIFs in legacy software, or how APNG is just PNG with extra data, so it similarly falls back to a non-animated version in legacy software, etc. Most successful formats follow this pattern.)
The intention is for FFV 2 to be a superset of FFV 1. It will just define new opcodes and new metadata chunks. I think it's perfectly feasible for FFV 1 decoders to simply ignore opcodes and metadata that it does not recognize.
I agree that the "APNG falls back to PNG" model is worth mimicking. I still think that's it's potentially useful to be able to distinguish FFVs 1 and 2. For example, the memory allocation requirements for an animated WxH graphic will be higher than for a static WxH graphic. Some decoders might wish to do all their allocation up front (after decoding Width and Height, available early in the decode process). They might like to know (potential) animated-ness just from parsing a few opening bytes rather than having to go arbitrarily deep into the file.
Metadata chunks already explicitly list the chunk length in bytes. We'd have to ensure that any new (FFV 2+) opcodes also do so (so that you can skip over them). Thanks for the feedback.
FFVs 1 and 2 can equivalently be considered Small and Large profiles of the overall IconVG file format.
I forgot to mention... another distinction will be that FFV 1 only requires sequential access (not random access) to the IconVG file. Again, that might be useful to know up front rather than having to go arbitrarily deep into the file.
Based on my recent experience implementing the spec in Dart, I have the following opinions:
Remove the A and a arc-related drawing opcodes.
No objection.
Change the first byte of the magic header from 0x89 to 0x8A, so that we can distinguish IconVG from PNG (from JPEG from WebP etc) just from the first byte of the file. https://en.wikipedia.org/wiki/List_of_file_signatures doesn't show any previous claims on 0x8A.
No objection, but anyone who is only looking at the first byte is doing themselves and their users a disservice. See also: https://mimesniff.spec.whatwg.org/
Add an explicit FFV number in the wire format. Specifically, change the fourth byte of the magic header from 0x47 (ASCII 'G') to 0x31 (ASCII '1') for FFV 1, 0x32 (ASCII '2') for FFV 2, etc.
I would recommend against this, as discussed above.
MID numbers must use the shortest possible encoding.
I would recommend against this, as it makes implementations more complicated and does not seem to solve any immediate issues. If the concern is being able to read the metadata section without a full decoder, parsing the metadata section is already pretty trivial. I don't think it's worth making that use case simpler at the cost of making a full decoder more complicated (since now it would need yet another way to decode numbers, this one just for metadata blocks).
Re-number the ViewBox and Suggested Palette MIDs (Metadata IDs) from 0 and 1 to 8 and 16 (which are represented on the wire as 0x10 and 0x20). Since metadata is presented in increasing MID order, the gaps allow future extensions to insert (optional) metadata chunks before these existing ones.
No objection. I'm not really sure why the order is required here though. The only benefit I see is that it makes catching duplicates more easy, but in practice I found it useful to have out-of-band flags for both of the existing metadata blocks anyway (viewBox because in languages with write-once-only fields you only want to write set the viewBox fields once so the default is set after reading metadata, not before; palette because I wanted to avoid copying into CREG if I didn't see a custom palette).
Prohibit encoded real numbers being NaNs. In the end state, the spec should no longer mention "undefined behavior".
No objection. Would you also prohibit +/- infinity?
Tighten restrictions on gradients: there must be at least two stops and the offsets must span from 0 to 1 inclusive.
No objection.
If I could be allowed to make some suggestions of my own:
I'm not really sure why the order is required here though.
Having metadata chunks appear in strictly-increasing MID order means that I can guarantee that e.g. the ViewBox (MID=8) chunk is in the first N bytes for some value of N, if it's present at all. That's assuming that every earlier chunk (lower MID) has an upper bound on how long it can be.
It's not a must-have feature, but I think it's not onerous and it might be nice to be able to say "if you can give me the first 128 bytes of the IconVG file than I can definitely tell you its (explicit or implicit) viewbox".
Would you also prohibit +/- infinity?
LOD1
can meaningfully be set to +infinity
, although I suppose 1e9
would be equivalent in practice.
Write the spec using RFC2119 language... I'd be happy to lend an editor's hand here if you would like.
I'd be happy to have your editor's hand... but I think that'd go best if you went to work after the spec gets 'upgraded' to at least FFV 1.
Remove the
A
anda
arc-related drawing opcodes.
For the record, the https://github.com/google/iconvg/issues/18 thread also discusses dropping the smooth ops S/s/T/t
and/or the relative ops l/t/q/s/c/a/m/h/v
.
at's assuming that every earlier chunk (lower MID) has an upper bound on how long it can be.
That's only true currently because it's MID 0, right? I don't see anything in the format that would prevent unknown metadata blocks from being arbitrarily large.
Preliminary thoughts on FFV 2. Very preliminary.
Let a single .ivg
file can contain multiple graphics. Users can open individual ones by name, e.g. "device/battery50" from material_icons.ivg
.
MID 0 (in FFV 1 MID numbering) holds a map (wire format TBD) from string to FileSegment. FileSegment is a uint64le
, packing a 40-bit file offset and a 24-bit segment length. In this context that FileSegment holds a (non-Collection) 'headless' IconVG graphic. Headless means that it skips the 4-byte magic header.
A new (optional) metadata chunk that gives names to CREG indices. For example, 0:"skin", 1:"hair", etc. Might include the reverse map too: {"hair": 1, "skin": 0}.
Also have another metadata chunk (call it "Parameter Names") that does this for NREG instead of CREG. For example, NREG[32] could conventionally be called t, an animation time parameter.
Also allow Suggested and Custom Parameters, which do to NREG what the Suggested and Custom Palette do to CREG.
The (human-readable) names are for use 'externally', by implementations or libraries that consume IconVG. 'Within' the IconVG itself, things are identified by an integer ID or by a FileSegment.
Answers "what part of the graphic did I just click on"?
Add a new 6-bit HITTEST register and a new styling opcode to copy NSEL to HITTEST. Current value of HITTEST is passed to callbacks when exiting drawing mode (i.e. filling a path), augmenting the paint attributes (RGBA flat color, gradient, etc) that's already passed at the same time.
Let multiple graphics (within a single file) share common elements. Allow a collection to hold "foo" and "foo-with-bar-badge" graphics. Allow a collection to hold "qux_en", "qux_de", "qux_zh_Hant_HK" graphics that re-use a base "qux" graphic. Note that text in general is out of scope due to its enormous complexity. Authors/tools are expected to 'flatten' the "en", "de" etc glyphs as simple paths.
New opcode (or opcodes?) in styling mode to do a 'function call': play another headless IconVG graphic, again identified by a u40;u24
FileSegment. The 'function call' also specifies a scalar GA (global alpha) and a 6-elem GTM (global transformation matrix) to apply to the callee, taken from NREG[NSEL-7 .. NSEL-0]
. The [i..j]
syntax here means an inclusive-low exclusive-high range. These 7*float32
globals are 'popped' when the 'call' returns, like a SkCanvas
save/restore
pair. TBD: something something clip too?
The callee has their own register state (CREG, NREG, etc) which is copied from the caller (possibly 'rotated' by the caller's CSEL/NSEL so that the caller's NREG[NSEL+i]
becomes the callee's NREG[i]
) on 'function call entry' but not copied back on 'function call exit'. The max recursion depth is TBD, but finite, explicit, and probably small (around 2-4, maybe even 1 and authoring tools are expected to inline deeper calls??).
Styling opcodes to manipulate NREG[NSEL-6 .. NSEL-0]
as an affine transformation matrix (called 'self'):
NREG[CSEL-6 .. CSEL-0]
(note: CSEL
, not NSEL
). In C++, I'd write "post-multiply 'self'" as *this = *this * arg
for some matrix-typed *this
.NREG[NSEL]
NREG[NSEL++] = NREG[CSEL++]
.These need to encode "skip the next N bytes if the (older) implementation doesn't support this opcode" somehow. This might be on a per-opcode basis, or perhaps an overall "SkipLT(N, V)" opcode to skip the next N bytes if the library doesn't support File Format Version V.
For drawing opcodes, we might also need to say whether unsupported opcodes should be replaced by line_to(x, y)
or move_to(x, y)
or a no-op. Or maybe a "SkipGE(N, V)" opcode, like "SkipLT(N, V)" but >= V
instead of < V
.
Or maybe a single "IfElseV(M, N, V)" opcode that:
>= V
,< V
.Add "JumpXX(N)" opcodes to skip the next N bytes if NREG[NSEL] XX NREG[CSEL]
, where XX
are comparison operators: equal, not-equal, less-than, less-equal, etc.
Add an explicit "Return" opcode?? EOF (End-of-File) or End-of-FileSegment is still end of graphic. Might not be necessary if equivalent to an (unconditional?) jump to the end.
NREG[NSEL] += NREG[CSEL]
NREG[NSEL] *= NREG[CSEL]
NREG[NSEL] = 1.0 / NREG[CSEL]
Crazy (??) idea: just embed an eBPF interpreter (constrained similar to what the Linux kernel does, e.g. runtime verification of no backwards branches) and let authors/tools write their own ease-in ease-out curves or generally go wild. One complication is that IconVG speaks float32
and eBPF speaks uint64
. Perhaps have support (built-in 'syscalls') to convert between float32
and 48.16
fixed point??
Like the 'function call' opcode, but with twice the number of args (TBD: is "twice" necessary if matrix lerping can be done by Transformation Matrix Support and Arithmetic Opcodes??). The two separate graphics are tweened according to a zero-to-one blend argument (in NREG[NSEL-15]
??):
Animation comes from combining almost all of the above. User program passes t and other parameters (e.g. if various UI buttons are clicked), various FileSegment sub-graphics are programatically transformed, composed, tweened or skipped.
We might also need new metadata chunk for animation length and loopiness.
The following is hand-wavy, but the intention is for 'leaf nodes' (which don't make 'function calls', they're just a filled path) to be 'compilable' / uploadable to GPU-friendly formats and uniquely identified by their uint64
FileSegment. That compilation happens once, not once per animation frame. Rendering the scene at time t involves re-computing the alpha and transform for each leaf. This happens on the CPU, especially if eBPF is involved, but e.g. the pre-transformed geometry that was previously uploaded to the GPU stays unchanged.
Consider restricting nodes to hold either 'function call' ops or 'drawing mode' ops but not both: nodes are either a (pure) branch or a leaf.
That's only true currently because it's MID 0, right? I don't see anything in the format that would prevent unknown metadata blocks from being arbitrarily large.
If it's MID 8, we could constrain every earlier MID to be e.g. at most 16 bytes long, which should be enough for a redirect-pointer if necessary.
Preliminary thoughts on FFV 2. Very preliminary.
It's hard for me to provide feedback on these because I don't know what the problem domain is. I'm guessing from the list of features that it's substantially different from FFV0's problem domain, which seemed to be "format to allow the material icons to be rendered faithfully at any size from tiny files" (which explained the custom palette, the set of drawing features, gradients as a primitive, and the focus on small file sizes).
It's hard for me to provide feedback on these because I don't know what the problem domain is.
It's my attempt at solving https://github.com/flutter/flutter/issues/1831 and if I understand correctly, FFV 0 / FFV 1 isn't feature-rich enough (e.g. animation).
I should make my work-in-progress doc for that effort public, but I think you may have seen it. It lists some of the criteria for what such a format would need to address. One of the highlights which seems relevant here is that the top priority is render speed, with file size being somewhat low on the list; ideally one should be able to get relatively close to just copying significant chunks of the raw data into a shader to draw most of the image. I don't know if the opcode-based approach of IconVG can achieve that.
(By which I mean I literally don't know. There's an effort underway to provide arbitrary SPIR-V shader support for Flutter, and once that is landed I hope to experiment with it and see what kind of vector graphics renderer one can build directly into a shader.)
Add "JumpXX(N)" opcodes to skip the next N bytes if
NREG[NSEL] XX NREG[CSEL]
, whereXX
are comparison operators: equal, not-equal, less-than, less-equal, etc.
Some more thinking out loud: if we had a "JumpLOD(H0, H1, N)" opcode, that skipped the next N bytes if the height-in-pixels H was outside the H0..H1 range, then we wouldn't need the LOD registers. Skipping N bytes in one motion would also be simpler and faster than decoding one opcode at a time until we're back in LOD range.
Or maybe we add the JumpXX opcodes and also another one to set NREG[NSEL] = height_in_pixels
...
Typo spotted?
8 and 16 (which are represented on the wire as 0x10 and 0x20)
Should be 0x08 and 0x10?
Should be 0x08 and 0x10?
No, it's 0x10 and 0x20. MIDs are encoded as Natural Numbers and the IconVG spec says "For a 1 byte encoding, the remaining 7 bits form an integer value in the range [0, 1<<7). For example, 0x28 encodes the value 0x14 or, in decimal, 20".
Another update summarizing my current thinking, in case anyone's interested.
I still like the "mission statement" at the top of the main README file. "A compact, binary format for simple vector graphics: icons, logos, glyphs and emoji." Longer term, maybe animation or security would also gain an explicit mention.
Compactness is a goal, but it's not the only goal. The aim isn't compactness at any cost. "Just use gzipped SVG" might be competitive in terms of compactness, but a very different story from a security and implementation complexity perspective.
Simplicity is also a goal, but again, it's not the only goal. There's usually also a trade-off between simplicity and feature richness.
S/s/T/t
and the relative ops l/t/q/s/c/a/m/h/v
.gain an opcode for "jump past the next N bytes".
Unless carefully specified, this would mean that jumping to the middle of other operation is possible. I don't think this is desirable for a number of reasons including the security implication. I expect that the parsing cost is not very high, so I think this should be "decode but ignore next N instructions" instead.
"decode but ignore next N instructions"
Well, this requires deciding how long (in bytes) each reserved opcode will be. Specifying that today could be awkward if we want to eventually have some sort of scripting or general computing (to support animation), but we haven't concluded yet how that'll be implemented or represented on the wire.
Some more thinking out loud...
The way that gradients are encoded in the unused parts of alpha-premultiplied RGBA space is clever. But somebody (I forget who) once told me that a difference between programming and software engineering is whether "clever" is a compliment or a pejorative.
I can't find the link, but I do remember @Hixie saying at some point that this cleverness makes it hard, in the future, if we want to add different sorts of paints. For example, blend modes (color dodge), effects (blurs) or something something hit-testing.
@lifthrasiir also made the point in #31 that a lot of a gradient's description could be "opcode arguments" instead of being cleverly squeezed into the CREGs.
Perhaps we should split the paint ops (what's currently 0xE1
"exit drawing mode", but would probably be renamed as "fill" if we no longer have two modes) into two classes:
CREG[CSEL]
CREG[CSEL-NSTOPS .. CSEL] and NREG[NSEL-NSTOPS .. NSEL]
."Explicit opcode arguments" means that a "special paint" opcode is followed by a number of extra bytes, the way that an L
op is followed by extra bytes for coordinate pairs.
Afterwards, CREG
only holds flat colors or gradient stop colors. NREG
only holds gradient stop offsets. We could make it invalid to set a CREG
to something that's not valid alpha-premul.
If we also encourage a 'stack' model per #31, so that assigning CSEL
and NSEL
specific values become less important than incrementing / decrementing them, then the 128 1-byte "Set CSEL/NSEL" opcodes could collapse to 2 2-byte opcodes, opening up a lot more opcode space...
Overall, changing how gradients are represented would make it a little harder to upgrade FFV 0 to FFV 1 automatically, if the graphic uses gradients, but it's probably still doable.
Well, this requires deciding how long (in bytes) each reserved opcode will be.
That's true, but it is not much different from putting the length information for any subsequent opcode (unless multiple such opcodes in a run are frequent). I think the "special paint" opcode you've mentioned is a good candidate to include the explicit length for example.
As noted by Hixie in #11, we need to explicitly decide if two unrelated data can overlap or can't. I prefer overlap to be impossible, mainly because it would be easier to control the interpretation than otherwise. If overlap is possible we risk diverging interpretations. Consider the following:
a view when X is unsupported a view when X is supported
+--------------------------------+ +--------------------------------+
| jump to P if X is unsupported | | jump to P if X is unsupported |
+--------------------------------+ +--------------------------------+
: (not parsed) : | opcode X |
: : +--------------------------------+
: : | arguments to X |
: : | |
+--------------------------------< P > |
| opcode Y | | |
+--------------------------------+ | |
| arguments to Y | +--------------------------------+
| | | opcode Z |
| | +--------------------------------+
The opcode Y is overlapping with arguments to X in this example, and this desynchronization can result in wildly different interpretations or (more usually) an invalid image only when X is supported. Ideally we want this situation to be impossible at all. One alternative is the following:
+--------------------------------+
| opcode X (FFV 2) |
| +----------------------------+ |
| | length of arguments to X |----+
| +----------------------------+ | |
| | arguments to X | | | byte length
| | | | |
| | |<---+
| | | |
| +----------------------------+ |
+--------------------------------+
| ignore K opcodes if X is |--+
| unsupported | |
+--------------------------------+\ | # opcodes
| opcode Y and arguments (FFV 1) | \|
+--------------------------------+ +
| opcode Z and arguments (FFV 1) | /
+--------------------------------+/
All implementations since FFV 1 can determine the entire structure, but only those supporting X can execute the opcode X. No byte can be interpreted in multiple ways. This is not the only way to do that, but it seems that encoding the length of arguments right into all future opcodes is necessary.
Another way to do this would be to split the opcode space by number of arguments, For example, Opcodes 0x00 .. 0x1F have zero arguments, opcodes 0x20 .. 0x7F have 6 arguments, opcodes 0x80 .. 0xDF have 12 arguments, opcodes 0xE0 .. 0xFF have 16 arguments. Or whatever. Or equivalently, opcodes could be two bytes long, with one byte always coincidentally giving the length of arguments. The point is that you decouple the parsing from the interpreting, so that parsing is future-proof.
Some more thoughts. They're not final, I just want to write down some ideas-in-progress before I forget.
CREG
and NREG
concepts so that there's only one kind of register. Call it REG
. Likewise, there's now only one selector, SEL
.REG
registers and they're 64 bits (8 bytes) each, little-endian. The high 32 bits are ABGR (high 8 bits are Alpha), either a flat color or a gradient stop color. The low 32 bits are ignored (for flat colors) or a gradient stop offset (an unsigned 16.16 fixed point number). Future expansions might apply other semantics to the low 32 or whole 64 bits. Scripts (see below) will also be able to access these registers as u32le×2
, u64le×1
, f32le×2
, etc.REG
access uses a ring-stack model. Ring means that all REG
indexes are still modulo 64. Stack means that there's no SEL = N
opcodes, only SEL += N
. Any one byte color that previously involved CREG[I]
now involves CREG[SEL+I]
, which is the high 32 bits of REG[SEL+I]
.ADJ
. The opcode writes to REG[SEL-ADJ]
. If ADJ
is 0 then the write is followed by SEL++
(the write is a stack push).u32le
), copied to the low 32 bits of REG[SEL-ADJ]
. A 0-byte number is always zero.The low 4 bits form a number RL0
. If RL0
is zero then a natural number RL1
follows (in 1, 2 or 4 bytes) and the run length RL
is set to (RL1 + 16)
. If RL0
is non-zero then RL
is set to RL0
. After the opcode (and after RL1
if present) are one or more absolute (not relative) coordinate pairs (a pair is (x, y)
). Each coordinate is encoded in 1, 2 or 4 bytes the same as FFV 0.
0x80 ..= 0x8F
encodes LineTo. There are (1 * RL)
coordinate pairs.0x90 ..= 0x9F
encodes QuadTo. There are (2 * RL)
coordinate pairs.0xA0 ..= 0xAF
encodes CubeTo. There are (3 * RL)
coordinate pairs.0xB0 ..= 0xB3
encode a quarter-, half-, three-quarter- or full-ellipse. There are 2
coordinate pairs.0xB4
encodes a parallelogram. There are 2
coordinate pairs.0xB5
encodes a MoveTo (closing any in-progress path first). There is 1
coordinate pair.Processing the ellipse or parallelogram opcodes requires knowing the 'current point' to start from, also known as the 'pen location'. This is just the last coordinate pair of a LineTo, QuadTo, CubeTo or MoveTo op. For example, after 5 consecutive CubeTo operations, the current point is set to the last of the 15 coordinate pairs (5 * 3 = 15).
0xB6
is followed by a 1-byte DELTA
value and means SEL += DELTA
.0xB7
is a no-op.0xB8 ..= 0xBF
are reserved but the fallback effect is a single LineTo. The opcode byte is followed by a natural number N
and then N
'opcode argument' bytes. Those bytes are then followed by a coordinate pair that is the LineTo argument (in the fallback case). If a future expansion employs these opcodes, the semantics are also expected to move the current point to this final coordinate pair.Fills close any in-progress path.
ADJ
. If ADJ
is 0 then the opcode execution is preceded by --SEL
(the read is a stack pop).0xC0 ..= 0xCF
fills with a basic paint: a flat color from REG[SEL-ADJ]
.0xD0 ..= 0xDF
fills with an advanced paint and is followed by a 4-byte u32le
'opcode argument', called flag.NSTOPS
. The fill involves REG[SEL-ADJ-NSTOPS .. SEL-ADJ]
.TBD: complexity 0 might be repurposed for hit-testing: filling rough paths with multiple invisible-but-different colors.
The first three are followed by a natural number J
and possibly further arguments.
0xE0
means to jump past the next J
instructions, unconditionally. Instructions are atomic (you can't jump to the middle of a multi-byte instruction), which means that all opcodes need to know the byte-size of their 'opcode arguments' even if they are reserved opcodes.0xE1
means to jump past the next J
instructions if the decoder does not support all of a bitmask F
of IconVG features, where F
is a natural number 'opcode argument' encoded after J
. There are no feature bits currently assigned but future expansions may use them.0xE2
means to jump past the next J
instructions if the decoder's height in pixels is outside of the half-open interval LOD0 .. LOD1
, where these two natural numbers are 'opcode arguments' encoded after J
.0xE3
is a 'return'. If we're in a macro (see below), return to the caller. If we're not in a macro, this signifies the end of the graphic (even if we're not at the end of the file).It's invalid to jump past the end of the file or macro segment.
They are typically followed by an 8 byte FileSegment (40 bit file offset, 24 bit file length) and possibly further arguments.
0xE4
means a macro expansion (like a C #include
line, but given a number-pair FileSegment instead of a string filename).0xE5
is like 0xE4
but the FileSegment is followed by 6 coordinates that form an affine transform matrix to apply to the geometry and transform matrices within that macro.0xE6
is like 0xE4
but the FileSegment is interpreted as separate 'scripting bytecode' (more to say about that in a future post) instead of the IconVG bytecode detailed in this comment.0xE7
has a natural number N
opcode argument instead of an 8 byte FileSegment. The next N
bytes is a script, like 0xE6
but inline.The macro opcodes 0xE4
and 0xE5
are invalid when already in a macro expansion. No recursion allowed.
All four opcodes are a single instruction for "jump past the next J
instructions" accounting.
0xE8 ..= 0xFF
are followed by a natural number N
and then N
bytes of 'opcode arguments'.More thoughts...
FileSegments are tweaked. There's a uint64le
flavor (an "Absolute FileSegment"):
0x00
means that the segment holds IconVG bytecode. 0x01
means 'scripts' (e.g. animation) although the scripting language details are still TBD. Other values are reserved.There's also a uint32le
flavor (an "Inline FileSegment"), just the low 32 bits. There are no redirects and the SegmentOffset is implicit: it immediately follows the uint32le
.
IconVG files can be larger than 2 GiB. The redirect bit being set on an Absolute FileSegment means that the 31+24=55 middle bits are a file offset for another 16 bytes: uint64le
SegmentOffset and uint64le
SegmentLength.
The low 4 bits form a number RL0
. If RL0
is zero then a natural number RL1
follows (in 1, 2 or 4 bytes) and the run length RL
is set to (RL1 + 16)
. If RL0
is non-zero then RL
is set to RL0
. After the opcode (and after RL1
if present) are one or more absolute (not relative) coordinate pairs (a pair is (x, y)
). Each coordinate is encoded in 1, 2 or 4 bytes the same as FFV 0 (tweaked by #33).
0x00 ..= 0x0F
encodes LineTo. There are (1 * RL)
coordinate pairs.0x10 ..= 0x1F
encodes QuadTo. There are (2 * RL)
coordinate pairs.0x20 ..= 0x2F
encodes CubeTo. There are (3 * RL)
coordinate pairs.0x30 ..= 0x33
encode a quarter-, half-, three-quarter- or full-ellipse. There are 2
coordinate pairs.0x34
encodes a parallelogram. There are 2
coordinate pairs.0x35
encodes a MoveTo (closing any in-progress path first). There is 1
coordinate pair.Processing the ellipse or parallelogram opcodes requires knowing the 'current point' to start from, also known as the 'pen location'. See Three Points (Two Opposing) Define an Ellipse. For example, after 5 consecutive CubeTo operations, the current point is set to the last of the 15 coordinate pairs (5 * 3 = 15).
0x36
is followed by a 1-byte DELTA
value and means SEL += DELTA
, modulo 64.0x37
is a no-op.The first three are followed by a natural number J
and possibly further arguments.
0x38
means to jump past the next J
instructions, unconditionally. Instructions are atomic (you can't jump to the middle of a multi-byte instruction), which means that all opcodes need to know the byte-size of their 'opcode arguments' even if they are reserved opcodes.0x39
means to jump past the next J
instructions if the decoder does not support all of a bitmask F
of IconVG features, where F
is a natural number 'opcode argument' encoded after J
. Scripts (SegmentType 0x01
) require feature bit 0x0000_0001
. Other feature bits are reserved for future expansions.0x3A
means to jump past the next J
instructions if the decoder's height in pixels is outside of the half-open interval LOD0 .. LOD1
, where these two natural numbers are 'opcode arguments' encoded after J
.0x3B
is a 'return'. If we're in a sub-routine (see below), return to the caller. If we're not in a sub-routine, this signifies the end of the graphic (even if we're not at the end of the file or sub-routine FileSegment).It's invalid to jump past the end of the file or sub-routine FileSegment.
0x3C
call sans-ATM Inline FileSegment0x3D
call with-ATM Inline FileSegment0x3E
call sans-ATM Absolute FileSegment0x3F
call with-ATM Absolute FileSegmentIf the opcode 0x01
bit is set, this is followed an ATM (alpha and transform matrix). An ATM is a 1-byte alpha value and then a 3x2 affine transform matrix (each number encoded as if it was a coordinate) to apply (multiply) to the paints, geometry and transform matrices within that sub-routine. 'No ATM' is equivalent to an 0xFF
alpha and identity transform matrix.
The ATM (or lack of it) is followed by a 4 byte Inline FileSegment (e.g. 'switch to scripting mode') or 8 byte Absolute FileSegment (e.g. 're-use shared paths and fills', 're-use shared scripts'), depending on the opcode 0x02
bit being off or on. An Inline FileSegment is followed by SegmentLength bytes.
These four opcodes are only valid when executing 'at the top level'. They're invalid if encountered when already in a sub-routine call.
64 ring-stack registers REGS
, 64 bits each, and one SEL
selector register. It's like the earlier comment in this issue, except the stack now grows downwards. A stack push decrements (not increments) SEL
. ADJ
adjustments are added (not subtracted).
0x40 ..= 0x4F
sets the low 32 bits of a single register (the high bits are zeroed).0x50 ..= 0x5F
sets the high 32 bits of a single register (the low bits are zeroed).0x60 ..= 0x6F
sets all 64 bits of a single register.0x70 ..= 0x7F
sets all 64 bits of multiple registers.For the first 48 opcodes, the low 4 bits give an ADJ
value. These opcodes write to REGS[(SEL+ADJ)&63]
. It also post-decrements SEL
when ADJ
is zero.
For the last 16 opcodes, let LENGTH
equal 2 plus the opcode's low 4 bits. They pre-decrement SEL
by LENGTH
and then consume LENGTH
uint64le
values, storing them in REGS[SEL+1]
, REGS[SEL+2]
, ..., REGS[SEL+LENGTH]
, in that order.
"Sets the low/high 32 bits" means that the opcode is followed by a uint32le
number to put in the corresponding low/high half of the REGS
element (the other half is zeroed). "Sets all bits" means that the opcode is followed by one (opcodes 0x60 ..= 0x6F
) or more (opcodes 0x70 ..= 0x7F
) uint64le
numbers.
Low 32 bits are interpreted as unsigned 16.16 fixed point when used as gradient stops (e.g. 0xC000
represents a gradient stop offset of 0.75
). Future expansions may interpret the bits in other ways.
High 32 bits are intepreted as alpha-premultiplied RGBA colors. Alpha less than any of Red, Green or Blue has special meaning, as they would otherwise be invalid alpha-premultiplied colors. That special meaning is either a blend (Alpha is zero) or a 'discriminated transparent black' (Alpha is non-zero).
A blend is what FFV0 calls a 3-byte indirect color. G and B give 1-byte colors SRC0
and SRC1
and R is the BLEND
(0x00
means all-SRC0
, 0xFF
means all-SRC1
):
RESULTANT.RED = (((255-BLEND) * SRC0.RED) + (BLEND * SRC1.RED) + 128) / 255
Ditto for GREEN, BLUE and ALPHA
1-byte colors are similar to but tweaked from FFV0. 0x00
, 0x01
and 0x02
mean RGBA values 00:00:00:00
, 80:80:80:80
and C0:C0:C0:C0
. 0x03 ..= 0x7F
mean base-5 opaque colors. 0x80 ..= 0xBF
mean from the custom palette. 0xC0 ..= 0xFF
(call the value c
) takes the color from REGS[(SEL+c)&63]
.
A 'discriminated transparent black' means that the paint is a no-op, in terms of modifying pixel colors, but having multiple 'transparent black' values can be useful for hit-testing: this shape is 'transparent black number 1', this other shape is 'transparent black number 2', etc.
SEL
is initially set to 56, allowing easy read access to registers 0..=7
(initialized from the custom palette if given) and easy read/write access to registers 57..=63
(typically 'scratch' space).
The opcode's low 4 bits give an ADJ
value. These opcodes read from REGS[(SEL+ADJ)&63]
. Gradients also read from later REGS
, per the number of gradient stops. It also pre-increments SEL
when ADJ
is zero.
0x80 ..= 0x8F
fills with a flat color.0x90 ..= 0x9F
fills with a linear gradient. It is followed by a one byte GRADIENT_ARGS
and then a 3x1 matrix, per 'complexity' in the earlier comment.0xA0 ..= 0xAF
fills with a radial gradient. It is followed by a one byte GRADIENT_ARGS
and then a 3x2 matrix, per 'complexity' in the earlier comment.0xB0 ..= 0xBF
is reserved, but the fallback is to fill with a flat color. It is followed by a natural number N
and then N
extra bytes.For the GRADIENT_ARGS
byte, the low 6 bits give the number of stops minus 2 (and 65 stops is invalid). The high 2 bits give the spread (how to extrapolate color stops outside the 0..1
stop offset nominal range).
0xC0 ..= 0xDF
are reserved but the fallback effect is a single LineTo. The opcode byte is followed by a natural number N
and then N
'argument bytes'. Those bytes are then followed by a coordinate pair that is the LineTo argument (in the fallback case). If a future expansion employs these opcodes, the semantics are also expected to move the current point to this final coordinate pair.0xE0 ..= 0xFF
are followed by a natural number N
and then N
bytes of 'opcode arguments'. The fallback effect is a no-op.
Summary
I propose to:
Background
Since its inception in 2016, IconVG has always carried the caveat that "WARNING: THIS FORMAT IS EXPERIMENTAL AND SUBJECT TO INCOMPATIBLE CHANGES".
Issue #2 in this repository is about adding animation to IconVG graphics. Tweening would almost certainly involve transformations (in the "affine transformation" sense) and interpolation.
The original IconVG design took the entirety of the SVG path model, including elliptical arc segments. Unlike
line_to
,quad_to
andcube_to
,arc_to
's parameterization is unique, not being a sequence of(x, y)
coordinate pairs, and a boolean argument likelarge-arc-flag
is impossible to interpolate smoothly.Rasterization backends like Cairo and Skia also don't provide
arc_to
as a primitive, or if they do, not in the way that SVG parameterizes it. We usually approximate arcs as cubic splines.Also recall that IconVG is a presentation format, not an authoring format, and it already isn't able to represent groups, strokes, text, etc 'natively'. Authoring tools like Illustrator or Inkscape, if they could export to IconVG, are expected to 'lower' e.g. stroked paths to more primitive operations (filled paths), the same way that they would 'flatten' layers if exporting to PNG. I'd expect such tools could also 'lower' arcs to cubic Béziers during export.
Thus, I'm considering removing arcs from the file format. This new version (File Format Version 1) would not be a superset of FFV 0 per se, but FFV 0 files could be converted in a straightforward way and the rasterizations would be equivalent. In essence, 'lowering' arcs becomes the responsibility of the authoring tools (which get more complicated) instead of the presentation tools (which get simpler).
Separately, the original Go implementation (the
golang.org/x/exp/shiny/iconvg
package in a separate repository) was released as an interim milestone of the unfinished 'Shiny' Go GUI project. IconVG hasn't had much adoption so far, as the only implementation was in Go and so not usable from e.g. C++, Dart or Python GUI programs. In recent weeks, this repository has gained a brand new C implementation, but we still don't yet have a vast back-catalogue of existing IconVG files to constrain us.Bringing all of the above together, if I were ever to make an IconVG FFV 1, especially one that isn't a superset of FFV 0 (because arcs), then now is the time to do it.
This issue is a place to discuss that process and what other features to add or warts to remove as part of FFV 1.
File Format Changes
See the spec for context.
The major change is:
A
anda
arc-related drawing opcodes.Minor clean-up changes are:
0x89
to0x8A
, so that we can distinguish IconVG from PNG (from JPEG from WebP etc) just from the first byte of the file. https://en.wikipedia.org/wiki/List_of_file_signatures doesn't show any previous claims on0x8A
.0x47
(ASCII 'G') to0x31
(ASCII '1') for FFV 1,0x32
(ASCII '2') for FFV 2, etc.0x10
and0x20
). Since metadata is presented in increasing MID order, the gaps allow future extensions to insert (optional) metadata chunks before these existing ones.Implementations
golang.org/x/exp/shiny/iconvg
, will speak FFVs 0 and 1+, delegating the latter to the 'new' Go library.Notably, any existing Go code (using the 'old' Go library) displaying existing (FFV 0) files will continue to work.
Timeline
FFV 1 should be finalized 'soon'. FFV 2 is more open ended and will require extensive prototyping.