google / iconvg

IconVG is a compact, binary format for simple vector graphics: icons, logos, glyphs and emoji.

Apache License 2.0

677 stars 11 forks source link

Proposal: File Format Versions 1, 2 and Beyond #4

Open nigeltao opened 3 years ago

nigeltao commented 3 years ago

Summary

I propose to:

Retroactively name the current (as of May 2021) IconVG wire format as FFV 0 (File Format Version 0).
Consider FFV 0 a deprecated experiment and no longer supported.
Introduce FFV 1 as a subset (in terms of features) of FFV 0 but incompatible (in terms of wire format), in order to remove features that are hard to combine with animation, and to clean up some design warts.
FFV 2 (if it happens, in the future) will be a superset of FFV 1 in terms of both features and wire format (other than a one-byte-change version-number bump). The intended headline feature for FFV 2 is animation (issue #2). FFVs 1 and 2 can equivalently be considered Small and Large profiles of the overall IconVG file format. For example, a future font format could embed FFV 1 (static) graphics but not FFV 2 (static and animated).

Background

Since its inception in 2016, IconVG has always carried the caveat that "WARNING: THIS FORMAT IS EXPERIMENTAL AND SUBJECT TO INCOMPATIBLE CHANGES".

Issue #2 in this repository is about adding animation to IconVG graphics. Tweening would almost certainly involve transformations (in the "affine transformation" sense) and interpolation.

The original IconVG design took the entirety of the SVG path model, including elliptical arc segments. Unlike line_to, quad_to and cube_to, arc_to's parameterization is unique, not being a sequence of (x, y) coordinate pairs, and a boolean argument like large-arc-flag is impossible to interpolate smoothly.

Rasterization backends like Cairo and Skia also don't provide arc_to as a primitive, or if they do, not in the way that SVG parameterizes it. We usually approximate arcs as cubic splines.

Also recall that IconVG is a presentation format, not an authoring format, and it already isn't able to represent groups, strokes, text, etc 'natively'. Authoring tools like Illustrator or Inkscape, if they could export to IconVG, are expected to 'lower' e.g. stroked paths to more primitive operations (filled paths), the same way that they would 'flatten' layers if exporting to PNG. I'd expect such tools could also 'lower' arcs to cubic Béziers during export.

Thus, I'm considering removing arcs from the file format. This new version (File Format Version 1) would not be a superset of FFV 0 per se, but FFV 0 files could be converted in a straightforward way and the rasterizations would be equivalent. In essence, 'lowering' arcs becomes the responsibility of the authoring tools (which get more complicated) instead of the presentation tools (which get simpler).

Separately, the original Go implementation (the golang.org/x/exp/shiny/iconvg package in a separate repository) was released as an interim milestone of the unfinished 'Shiny' Go GUI project. IconVG hasn't had much adoption so far, as the only implementation was in Go and so not usable from e.g. C++, Dart or Python GUI programs. In recent weeks, this repository has gained a brand new C implementation, but we still don't yet have a vast back-catalogue of existing IconVG files to constrain us.

Bringing all of the above together, if I were ever to make an IconVG FFV 1, especially one that isn't a superset of FFV 0 (because arcs), then now is the time to do it.

This issue is a place to discuss that process and what other features to add or warts to remove as part of FFV 1.

File Format Changes

See the spec for context.

The major change is:

Remove the A and a arc-related drawing opcodes.

Minor clean-up changes are:

Change the first byte of the magic header from 0x89 to 0x8A, so that we can distinguish IconVG from PNG (from JPEG from WebP etc) just from the first byte of the file. https://en.wikipedia.org/wiki/List_of_file_signatures doesn't show any previous claims on 0x8A.
Add an explicit FFV number in the wire format. Specifically, change the fourth byte of the magic header from 0x47 (ASCII 'G') to 0x31 (ASCII '1') for FFV 1, 0x32 (ASCII '2') for FFV 2, etc.
MID numbers must use the shortest possible encoding.
Re-number the ViewBox and Suggested Palette MIDs (Metadata IDs) from 0 and 1 to 8 and 16 (which are represented on the wire as 0x10 and 0x20). Since metadata is presented in increasing MID order, the gaps allow future extensions to insert (optional) metadata chunks before these existing ones.
Prohibit encoded real numbers being NaNs. In the end state, the spec should no longer mention "undefined behavior".
Tighten restrictions on gradients: there must be at least two stops and the offsets must span from 0 to 1 inclusive.
Maybe some other small things I've forgotten.

Implementations

This repository's C and Go libraries (the latter also to be called the 'new' Go library) will speak FFVs 1+ but not FFV 0.
The original written-in-2016 Go library (the 'old' Go library), at golang.org/x/exp/shiny/iconvg, will speak FFVs 0 and 1+, delegating the latter to the 'new' Go library.
The 'old' Go library will also gain tools to upgrade FFV 0 files to FFV 1.

Notably, any existing Go code (using the 'old' Go library) displaying existing (FFV 0) files will continue to work.

Timeline

FFV 1 should be finalized 'soon'. FFV 2 is more open ended and will require extensive prototyping.

Hixie commented 3 years ago

I would discourage the use of version numbers. They prevent a format from being forward-compatible. Better to define error handling behaviour for all possible error conditions (unknown op codes, etc) and then add features in a backwards-compatible manner, IMHO.

Hixie commented 3 years ago

(Breaking with FFV0 is fine, I'm just saying to avoid FFV2 being incompatible with FFV1. Consider for example how animated GIFs fall back to non-animated GIFs in legacy software, or how APNG is just PNG with extra data, so it similarly falls back to a non-animated version in legacy software, etc. Most successful formats follow this pattern.)

nigeltao commented 3 years ago

The intention is for FFV 2 to be a superset of FFV 1. It will just define new opcodes and new metadata chunks. I think it's perfectly feasible for FFV 1 decoders to simply ignore opcodes and metadata that it does not recognize.

I agree that the "APNG falls back to PNG" model is worth mimicking. I still think that's it's potentially useful to be able to distinguish FFVs 1 and 2. For example, the memory allocation requirements for an animated WxH graphic will be higher than for a static WxH graphic. Some decoders might wish to do all their allocation up front (after decoding Width and Height, available early in the decode process). They might like to know (potential) animated-ness just from parsing a few opening bytes rather than having to go arbitrarily deep into the file.

Metadata chunks already explicitly list the chunk length in bytes. We'd have to ensure that any new (FFV 2+) opcodes also do so (so that you can skip over them). Thanks for the feedback.

nigeltao commented 3 years ago

FFVs 1 and 2 can equivalently be considered Small and Large profiles of the overall IconVG file format.

I forgot to mention... another distinction will be that FFV 1 only requires sequential access (not random access) to the IconVG file. Again, that might be useful to know up front rather than having to go arbitrarily deep into the file.

Hixie commented 3 years ago

Based on my recent experience implementing the spec in Dart, I have the following opinions:

Remove the A and a arc-related drawing opcodes.

No objection.

Change the first byte of the magic header from 0x89 to 0x8A, so that we can distinguish IconVG from PNG (from JPEG from WebP etc) just from the first byte of the file. https://en.wikipedia.org/wiki/List_of_file_signatures doesn't show any previous claims on 0x8A.

No objection, but anyone who is only looking at the first byte is doing themselves and their users a disservice. See also: https://mimesniff.spec.whatwg.org/

Add an explicit FFV number in the wire format. Specifically, change the fourth byte of the magic header from 0x47 (ASCII 'G') to 0x31 (ASCII '1') for FFV 1, 0x32 (ASCII '2') for FFV 2, etc.

I would recommend against this, as discussed above.

MID numbers must use the shortest possible encoding.

I would recommend against this, as it makes implementations more complicated and does not seem to solve any immediate issues. If the concern is being able to read the metadata section without a full decoder, parsing the metadata section is already pretty trivial. I don't think it's worth making that use case simpler at the cost of making a full decoder more complicated (since now it would need yet another way to decode numbers, this one just for metadata blocks).

Re-number the ViewBox and Suggested Palette MIDs (Metadata IDs) from 0 and 1 to 8 and 16 (which are represented on the wire as 0x10 and 0x20). Since metadata is presented in increasing MID order, the gaps allow future extensions to insert (optional) metadata chunks before these existing ones.

No objection. I'm not really sure why the order is required here though. The only benefit I see is that it makes catching duplicates more easy, but in practice I found it useful to have out-of-band flags for both of the existing metadata blocks anyway (viewBox because in languages with write-once-only fields you only want to write set the viewBox fields once so the default is set after reading metadata, not before; palette because I wanted to avoid copying into CREG if I didn't see a custom palette).

Prohibit encoded real numbers being NaNs. In the end state, the spec should no longer mention "undefined behavior".

No objection. Would you also prohibit +/- infinity?

Tighten restrictions on gradients: there must be at least two stops and the offsets must span from 0 to 1 inclusive.

No objection.

If I could be allowed to make some suggestions of my own:

Write the spec using RFC2119 language (see also https://ln.hixie.ch/?start=1140242962&count=1 and https://ln.hixie.ch/?start=1170104775&count=1). I'd be happy to lend an editor's hand here if you would like.
Clearer indications of how to handle error conditions (for each one specifying whether it should cause the agent to fail to render anything, or to ignore some bogus data).
Forward-compatible structure. Metadata blocks are forward-compatible (modulo https://github.com/google/iconvg/issues/11), but reserved opcodes don't have a defined length so adding them with arguments would be backwards-breaking. This may be desired in some cases, but not in others. (It's fine if invalid opcodes are considered a fatal error, but I can imagine a world where we would want to add an opcode that was just silently ignored by older user agents.)
It would be good if the spec explicitly called out what it was optimized for (I'm guessing file size based on the wording of documents presenting the format).
Remove all dependency on SVG for specification text (make the specification self-contained), as discussed in https://github.com/google/iconvg/issues/18.
The various potentially semantically meaningful clarifications I filed issues for, especially https://github.com/google/iconvg/issues/11, https://github.com/google/iconvg/issues/15, https://github.com/google/iconvg/issues/20, https://github.com/google/iconvg/issues/21, https://github.com/google/iconvg/issues/22.
The various clarifications I filed issues for, especially https://github.com/google/iconvg/issues/9, https://github.com/google/iconvg/issues/10, https://github.com/google/iconvg/issues/13, https://github.com/google/iconvg/issues/14, https://github.com/google/iconvg/issues/16, https://github.com/google/iconvg/issues/17, https://github.com/google/iconvg/issues/19.

nigeltao commented 3 years ago

I'm not really sure why the order is required here though.

Having metadata chunks appear in strictly-increasing MID order means that I can guarantee that e.g. the ViewBox (MID=8) chunk is in the first N bytes for some value of N, if it's present at all. That's assuming that every earlier chunk (lower MID) has an upper bound on how long it can be.

It's not a must-have feature, but I think it's not onerous and it might be nice to be able to say "if you can give me the first 128 bytes of the IconVG file than I can definitely tell you its (explicit or implicit) viewbox".

Would you also prohibit +/- infinity?

LOD1 can meaningfully be set to +infinity, although I suppose 1e9 would be equivalent in practice.

Write the spec using RFC2119 language... I'd be happy to lend an editor's hand here if you would like.

I'd be happy to have your editor's hand... but I think that'd go best if you went to work after the spec gets 'upgraded' to at least FFV 1.

nigeltao commented 3 years ago

Remove the A and a arc-related drawing opcodes.

For the record, the https://github.com/google/iconvg/issues/18 thread also discusses dropping the smooth ops S/s/T/t and/or the relative ops l/t/q/s/c/a/m/h/v.

Hixie commented 3 years ago

at's assuming that every earlier chunk (lower MID) has an upper bound on how long it can be.

That's only true currently because it's MID 0, right? I don't see anything in the format that would prevent unknown metadata blocks from being arbitrarily large.

nigeltao commented 3 years ago

Preliminary thoughts on FFV 2. Very preliminary.

Collections

Let a single .ivg file can contain multiple graphics. Users can open individual ones by name, e.g. "device/battery50" from material_icons.ivg.

MID 0 (in FFV 1 MID numbering) holds a map (wire format TBD) from string to FileSegment. FileSegment is a uint64le, packing a 40-bit file offset and a 24-bit segment length. In this context that FileSegment holds a (non-Collection) 'headless' IconVG graphic. Headless means that it skips the 4-byte magic header.

Palette Names, Parameter Names

A new (optional) metadata chunk that gives names to CREG indices. For example, 0:"skin", 1:"hair", etc. Might include the reverse map too: {"hair": 1, "skin": 0}.

Also have another metadata chunk (call it "Parameter Names") that does this for NREG instead of CREG. For example, NREG[32] could conventionally be called t, an animation time parameter.

Also allow Suggested and Custom Parameters, which do to NREG what the Suggested and Custom Palette do to CREG.

The (human-readable) names are for use 'externally', by implementations or libraries that consume IconVG. 'Within' the IconVG itself, things are identified by an integer ID or by a FileSegment.

Hit Testing

Answers "what part of the graphic did I just click on"?

Add a new 6-bit HITTEST register and a new styling opcode to copy NSEL to HITTEST. Current value of HITTEST is passed to callbacks when exiting drawing mode (i.e. filling a path), augmenting the paint attributes (RGBA flat color, gradient, etc) that's already passed at the same time.

Compound Graphics

Let multiple graphics (within a single file) share common elements. Allow a collection to hold "foo" and "foo-with-bar-badge" graphics. Allow a collection to hold "qux_en", "qux_de", "qux_zh_Hant_HK" graphics that re-use a base "qux" graphic. Note that text in general is out of scope due to its enormous complexity. Authors/tools are expected to 'flatten' the "en", "de" etc glyphs as simple paths.

New opcode (or opcodes?) in styling mode to do a 'function call': play another headless IconVG graphic, again identified by a u40;u24 FileSegment. The 'function call' also specifies a scalar GA (global alpha) and a 6-elem GTM (global transformation matrix) to apply to the callee, taken from NREG[NSEL-7 .. NSEL-0]. The [i..j] syntax here means an inclusive-low exclusive-high range. These 7*float32 globals are 'popped' when the 'call' returns, like a SkCanvas save/restore pair. TBD: something something clip too?

The callee has their own register state (CREG, NREG, etc) which is copied from the caller (possibly 'rotated' by the caller's CSEL/NSEL so that the caller's NREG[NSEL+i] becomes the callee's NREG[i]) on 'function call entry' but not copied back on 'function call exit'. The max recursion depth is TBD, but finite, explicit, and probably small (around 2-4, maybe even 1 and authoring tools are expected to inline deeper calls??).

Transformation Matrix Support

Styling opcodes to manipulate NREG[NSEL-6 .. NSEL-0] as an affine transformation matrix (called 'self'):

set to identity
(post?) multiply 'self' by NREG[CSEL-6 .. CSEL-0] (note: CSEL, not NSEL). In C++, I'd write "post-multiply 'self'" as *this = *this * arg for some matrix-typed *this.
(post?) multiply 'self' by a rotation angle given in NREG[NSEL]
etc
probably also some TBD ops to bulk-copy from one part of NREG to another part of NREG, maybe just a single op that does NREG[NSEL++] = NREG[CSEL++].

Reserved Opcodes

These need to encode "skip the next N bytes if the (older) implementation doesn't support this opcode" somehow. This might be on a per-opcode basis, or perhaps an overall "SkipLT(N, V)" opcode to skip the next N bytes if the library doesn't support File Format Version V.

For drawing opcodes, we might also need to say whether unsupported opcodes should be replaced by line_to(x, y) or move_to(x, y) or a no-op. Or maybe a "SkipGE(N, V)" opcode, like "SkipLT(N, V)" but >= V instead of < V.

Or maybe a single "IfElseV(M, N, V)" opcode that:

skips the next M bytes if >= V,
plays the next M bytes then skips the N after that, if < V.

Control Flow Opcodes

Add "JumpXX(N)" opcodes to skip the next N bytes if NREG[NSEL] XX NREG[CSEL], where XX are comparison operators: equal, not-equal, less-than, less-equal, etc.

Add an explicit "Return" opcode?? EOF (End-of-File) or End-of-FileSegment is still end of graphic. Might not be necessary if equivalent to an (unconditional?) jump to the end.

Arithmetic Opcodes

NREG[NSEL] += NREG[CSEL]
NREG[NSEL] *= NREG[CSEL]
NREG[NSEL] = 1.0 / NREG[CSEL]
etc

Crazy (??) idea: just embed an eBPF interpreter (constrained similar to what the Linux kernel does, e.g. runtime verification of no backwards branches) and let authors/tools write their own ease-in ease-out curves or generally go wild. One complication is that IconVG speaks float32 and eBPF speaks uint64. Perhaps have support (built-in 'syscalls') to convert between float32 and 48.16 fixed point??

Tweening

Like the 'function call' opcode, but with twice the number of args (TBD: is "twice" necessary if matrix lerping can be done by Transformation Matrix Support and Arithmetic Opcodes??). The two separate graphics are tweened according to a zero-to-one blend argument (in NREG[NSEL-15]??):

each path node is lerped. It is an error if the two children have a different number of path nodes. TBD: complications if node count depends on Level-of-Detail?
the two paints are lerped (what that means exactly for gradients TBD, possibly an error)
HITTEST is bitwise or'ed (??)

Animation

Animation comes from combining almost all of the above. User program passes t and other parameters (e.g. if various UI buttons are clicked), various FileSegment sub-graphics are programatically transformed, composed, tweened or skipped.

We might also need new metadata chunk for animation length and loopiness.

The following is hand-wavy, but the intention is for 'leaf nodes' (which don't make 'function calls', they're just a filled path) to be 'compilable' / uploadable to GPU-friendly formats and uniquely identified by their uint64 FileSegment. That compilation happens once, not once per animation frame. Rendering the scene at time t involves re-computing the alpha and transform for each leaf. This happens on the CPU, especially if eBPF is involved, but e.g. the pre-transformed geometry that was previously uploaded to the GPU stays unchanged.

Consider restricting nodes to hold either 'function call' ops or 'drawing mode' ops but not both: nodes are either a (pure) branch or a leaf.

TBD / Punted to FFV 3??

Blend modes (e.g. color dodge)
Clips / Masks (possibly punted to authoring tools, like strokes)
Effects / Filters (e.g. blurs, drop shadows, grain/noise); which of these are GPU friendly?

Still Out Of Scope

Raster Textures (e.g. JPEGs)
Fonts / Text / I18n
Strokes

nigeltao commented 3 years ago

That's only true currently because it's MID 0, right? I don't see anything in the format that would prevent unknown metadata blocks from being arbitrarily large.

If it's MID 8, we could constrain every earlier MID to be e.g. at most 16 bytes long, which should be enough for a redirect-pointer if necessary.

Hixie commented 3 years ago

Preliminary thoughts on FFV 2. Very preliminary.

It's hard for me to provide feedback on these because I don't know what the problem domain is. I'm guessing from the list of features that it's substantially different from FFV0's problem domain, which seemed to be "format to allow the material icons to be rendered faithfully at any size from tiny files" (which explained the custom palette, the set of drawing features, gradients as a primitive, and the focus on small file sizes).

nigeltao commented 3 years ago

It's hard for me to provide feedback on these because I don't know what the problem domain is.

It's my attempt at solving https://github.com/flutter/flutter/issues/1831 and if I understand correctly, FFV 0 / FFV 1 isn't feature-rich enough (e.g. animation).

Hixie commented 3 years ago

I should make my work-in-progress doc for that effort public, but I think you may have seen it. It lists some of the criteria for what such a format would need to address. One of the highlights which seems relevant here is that the top priority is render speed, with file size being somewhat low on the list; ideally one should be able to get relatively close to just copying significant chunks of the raw data into a shader to draw most of the image. I don't know if the opcode-based approach of IconVG can achieve that.

Hixie commented 3 years ago

(By which I mean I literally don't know. There's an effort underway to provide arbitrary SPIR-V shader support for Flutter, and once that is landed I hope to experiment with it and see what kind of vector graphics renderer one can build directly into a shader.)

nigeltao commented 3 years ago

Add "JumpXX(N)" opcodes to skip the next N bytes if NREG[NSEL] XX NREG[CSEL], where XX are comparison operators: equal, not-equal, less-than, less-equal, etc.

Some more thinking out loud: if we had a "JumpLOD(H0, H1, N)" opcode, that skipped the next N bytes if the height-in-pixels H was outside the H0..H1 range, then we wouldn't need the LOD registers. Skipping N bytes in one motion would also be simpler and faster than decoding one opcode at a time until we're back in LOD range.

Or maybe we add the JumpXX opcodes and also another one to set NREG[NSEL] = height_in_pixels...

BigBadaboom commented 3 years ago

Typo spotted?

8 and 16 (which are represented on the wire as 0x10 and 0x20)

Should be 0x08 and 0x10?

nigeltao commented 3 years ago

Should be 0x08 and 0x10?

No, it's 0x10 and 0x20. MIDs are encoded as Natural Numbers and the IconVG spec says "For a 1 byte encoding, the remaining 7 bits form an integer value in the range [0, 1<<7). For example, 0x28 encodes the value 0x14 or, in decimal, 20".

nigeltao commented 3 years ago

Another update summarizing my current thinking, in case anyone's interested.

Goals

I still like the "mission statement" at the top of the main README file. "A compact, binary format for simple vector graphics: icons, logos, glyphs and emoji." Longer term, maybe animation or security would also gain an explicit mention.

Compactness is a goal, but it's not the only goal. The aim isn't compactness at any cost. "Just use gzipped SVG" might be competitive in terms of compactness, but a very different story from a security and implementation complexity perspective.

Simplicity is also a goal, but again, it's not the only goal. There's usually also a trade-off between simplicity and feature richness.

Changes

Dropping features from FFV 0

rename the file extension (#30).
change the magic header's opening byte.
renumber Metadata Identifiers (MIDs).
drop drawing verbs (relative ArcTo, smooth CubeTo, etc) so that we only have absolute MoveTo, LineTo, QuadTo and CubeTo, (#29, and previously, #18). Equivalently, we drop the smooth ops S/s/T/t and the relative ops l/t/q/s/c/a/m/h/v.
doing so means there's not really a need for "real" and "zero-to-one" number formats. We go from four to two: just "natural" and "coordinate".
we could likewise also drop the 2 byte and 3 byte (direct) color formats, per #31.
culling many opcodes means that we no longer need two modes: styling and drawing. We can now fit all the opcodes into a single mode.

Future-proofing

gain an opcode for "stop (even though there's more bytecode)".
gain an opcode for "jump past the next N bytes".
gain an opcode for "jump past the next N bytes if feature F is disabled". F (an opcode argument) is a natural number, TBD whether it's a bitmask or 'just' a number. There will be a bunch of reserved opcodes that are currently invalid. Future versions might make them valid, but such files are expected to bracket their use with a jump-feature-disabled.
as @Hixie has argued for, we then don't need explicit version numbers (FFV 1, FFV 2, etc) in the wire format. It'll all just be forward-compatible IconVG. If an older renderer doesn't understand newer opcodes then the file can jump-feature-disabled to a fallback graphic.
LOD is no longer represented by registers. Instead, LOD0 and LOD1 are opcode arguments to "jump past the next N bytes if the height-in-pixels H is outside the range LOD0, LOD1".

New features

Add drawing ops for rectangles and circles (well, parallelograms and ellipses in general, but most of the use will probably be rectangles and circles). Given the current pen coordinates, a rectangle (four line segments) and a cubic-approximation-to-a-circle (four cubic Bézier curves) can both be specified by only four more coordinates: two vertex pairs. The space savings can be significant for a circle, which would otherwise require 4×6 = 24 explicit coordinates. For example, the Material Design blur icon has lots of circles.
I'm still pretty keen on Collections and Compound Graphics from the comment above.
I'd still like Animation (#2) in some form, which I still think comes naturally from adding Parameters and some sort of general compute to Compound Graphics. But the exact form is TBD. Things like Control Flow Opcodes and Arithmetic Opcodes were a means to the Animation end, but I'm open to alternatives.

lifthrasiir commented 3 years ago

gain an opcode for "jump past the next N bytes".

Unless carefully specified, this would mean that jumping to the middle of other operation is possible. I don't think this is desirable for a number of reasons including the security implication. I expect that the parsing cost is not very high, so I think this should be "decode but ignore next N instructions" instead.

nigeltao commented 3 years ago

"decode but ignore next N instructions"

Well, this requires deciding how long (in bytes) each reserved opcode will be. Specifying that today could be awkward if we want to eventually have some sort of scripting or general computing (to support animation), but we haven't concluded yet how that'll be implemented or represented on the wire.

nigeltao commented 3 years ago

Some more thinking out loud...

The way that gradients are encoded in the unused parts of alpha-premultiplied RGBA space is clever. But somebody (I forget who) once told me that a difference between programming and software engineering is whether "clever" is a compliment or a pejorative.

I can't find the link, but I do remember @Hixie saying at some point that this cleverness makes it hard, in the future, if we want to add different sorts of paints. For example, blend modes (color dodge), effects (blurs) or something something hit-testing.

@lifthrasiir also made the point in #31 that a lot of a gradient's description could be "opcode arguments" instead of being cleverly squeezed into the CREGs.

Perhaps we should split the paint ops (what's currently 0xE1 "exit drawing mode", but would probably be renamed as "fill" if we no longer have two modes) into two classes:

"basic paint" with a flat color: no explicit args, but use CREG[CSEL]
"special paint" (special = gradient for now, maybe others later): various explicit opcode args (number of stops, linear/radial, spread, some hand-wavy future-expansion capability) with stop color/offsets taken from CREG[CSEL-NSTOPS .. CSEL] and NREG[NSEL-NSTOPS .. NSEL].

"Explicit opcode arguments" means that a "special paint" opcode is followed by a number of extra bytes, the way that an L op is followed by extra bytes for coordinate pairs.

Afterwards, CREG only holds flat colors or gradient stop colors. NREG only holds gradient stop offsets. We could make it invalid to set a CREG to something that's not valid alpha-premul.

If we also encourage a 'stack' model per #31, so that assigning CSEL and NSEL specific values become less important than incrementing / decrementing them, then the 128 1-byte "Set CSEL/NSEL" opcodes could collapse to 2 2-byte opcodes, opening up a lot more opcode space...

Overall, changing how gradients are represented would make it a little harder to upgrade FFV 0 to FFV 1 automatically, if the graphic uses gradients, but it's probably still doable.

lifthrasiir commented 3 years ago

Well, this requires deciding how long (in bytes) each reserved opcode will be.

That's true, but it is not much different from putting the length information for any subsequent opcode (unless multiple such opcodes in a run are frequent). I think the "special paint" opcode you've mentioned is a good candidate to include the explicit length for example.

As noted by Hixie in #11, we need to explicitly decide if two unrelated data can overlap or can't. I prefer overlap to be impossible, mainly because it would be easier to control the interpretation than otherwise. If overlap is possible we risk diverging interpretations. Consider the following:

  a view when X is unsupported         a view when X is supported
+--------------------------------+   +--------------------------------+
| jump to P if X is unsupported  |   | jump to P if X is unsupported  |
+--------------------------------+   +--------------------------------+
: (not parsed)                   :   | opcode X                       |
:                                :   +--------------------------------+
:                                :   | arguments to X                 |
:                                :   |                                |
+--------------------------------< P >                                |
| opcode Y                       |   |                                |
+--------------------------------+   |                                |
| arguments to Y                 |   +--------------------------------+
|                                |   | opcode Z                       |
|                                |   +--------------------------------+

The opcode Y is overlapping with arguments to X in this example, and this desynchronization can result in wildly different interpretations or (more usually) an invalid image only when X is supported. Ideally we want this situation to be impossible at all. One alternative is the following:

+--------------------------------+
| opcode X (FFV 2)               |
| +----------------------------+ |
| | length of arguments to X   |----+
| +----------------------------+ |  |
| | arguments to X             | |  | byte length
| |                            | |  |
| |                            |<---+
| |                            | |
| +----------------------------+ |
+--------------------------------+
| ignore K opcodes if X is       |--+
| unsupported                    |  |
+--------------------------------+\ | # opcodes
| opcode Y and arguments (FFV 1) | \|
+--------------------------------+  +
| opcode Z and arguments (FFV 1) | /
+--------------------------------+/

All implementations since FFV 1 can determine the entire structure, but only those supporting X can execute the opcode X. No byte can be interpreted in multiple ways. This is not the only way to do that, but it seems that encoding the length of arguments right into all future opcodes is necessary.

Hixie commented 3 years ago

Another way to do this would be to split the opcode space by number of arguments, For example, Opcodes 0x00 .. 0x1F have zero arguments, opcodes 0x20 .. 0x7F have 6 arguments, opcodes 0x80 .. 0xDF have 12 arguments, opcodes 0xE0 .. 0xFF have 16 arguments. Or whatever. Or equivalently, opcodes could be two bytes long, with one byte always coincidentally giving the length of arguments. The point is that you decouple the parsing from the interpreting, so that parsing is future-proof.

nigeltao commented 3 years ago

Some more thoughts. They're not final, I just want to write down some ideas-in-progress before I forget.

Ring-Stack Registers

Merge the CREG and NREG concepts so that there's only one kind of register. Call it REG. Likewise, there's now only one selector, SEL.
There are 64 REG registers and they're 64 bits (8 bytes) each, little-endian. The high 32 bits are ABGR (high 8 bits are Alpha), either a flat color or a gradient stop color. The low 32 bits are ignored (for flat colors) or a gradient stop offset (an unsigned 16.16 fixed point number). Future expansions might apply other semantics to the low 32 or whole 64 bits. Scripts (see below) will also be able to access these registers as u32le×2, u64le×1, f32le×2, etc.
REG access uses a ring-stack model. Ring means that all REG indexes are still modulo 64. Stack means that there's no SEL = N opcodes, only SEL += N. Any one byte color that previously involved CREG[I] now involves CREG[SEL+I], which is the high 32 bits of REG[SEL+I].

The first 128 opcodes (4+2+1 bits) set REG values

The low 4 bits form an adjustment number ADJ. The opcode writes to REG[SEL-ADJ]. If ADJ is 0 then the write is followed by SEL++ (the write is a stack push).
The middle 2 bits mean that the opcode is followed by a 0-byte, 1-byte, 3-byte or 4-byte color. A 0-byte color is always transparent black. A 3-byte color is, in the old terminology, indirect (a blend).
The high 1 bit means that the opcode is then followed by a 0-byte or 4-byte number (a u32le), copied to the low 32 bits of REG[SEL-ADJ]. A 0-byte number is always zero.

The next 54 (48 + 6) opcodes specify path geometry

The low 4 bits form a number RL0. If RL0 is zero then a natural number RL1 follows (in 1, 2 or 4 bytes) and the run length RL is set to (RL1 + 16). If RL0 is non-zero then RL is set to RL0. After the opcode (and after RL1 if present) are one or more absolute (not relative) coordinate pairs (a pair is (x, y)). Each coordinate is encoded in 1, 2 or 4 bytes the same as FFV 0.

0x80 ..= 0x8F encodes LineTo. There are (1 * RL) coordinate pairs.
0x90 ..= 0x9F encodes QuadTo. There are (2 * RL) coordinate pairs.
0xA0 ..= 0xAF encodes CubeTo. There are (3 * RL) coordinate pairs.
0xB0 ..= 0xB3 encode a quarter-, half-, three-quarter- or full-ellipse. There are 2 coordinate pairs.
0xB4 encodes a parallelogram. There are 2 coordinate pairs.
0xB5 encodes a MoveTo (closing any in-progress path first). There is 1 coordinate pair.

Processing the ellipse or parallelogram opcodes requires knowing the 'current point' to start from, also known as the 'pen location'. This is just the last coordinate pair of a LineTo, QuadTo, CubeTo or MoveTo op. For example, after 5 consecutive CubeTo operations, the current point is set to the last of the 15 coordinate pairs (5 * 3 = 15).

The next 10 opcodes are miscellaneous / reserved

0xB6 is followed by a 1-byte DELTA value and means SEL += DELTA.
0xB7 is a no-op.
0xB8 ..= 0xBF are reserved but the fallback effect is a single LineTo. The opcode byte is followed by a natural number N and then N 'opcode argument' bytes. Those bytes are then followed by a coordinate pair that is the LineTo argument (in the fallback case). If a future expansion employs these opcodes, the semantics are also expected to move the current point to this final coordinate pair.

The next 32 opcodes specify path fills

Fills close any in-progress path.

The low 4 bits form an adjustment number ADJ. If ADJ is 0 then the opcode execution is preceded by --SEL (the read is a stack pop).
0xC0 ..= 0xCF fills with a basic paint: a flat color from REG[SEL-ADJ].
0xD0 ..= 0xDF fills with an advanced paint and is followed by a 4-byte u32le 'opcode argument', called flag.
The low 2 flag bits being 0, 1, 2 or 3 (called the 'complexity' for now, the name could surely be improved) means that it's followed by 0, 3, 6 or 12 floating point numbers (encoded as if they were coordinates).
The next 6 flag bits give NSTOPS. The fill involves REG[SEL-ADJ-NSTOPS .. SEL-ADJ].
The next 1 flag bit being 0 means a gradient fill. Being 1 is reserved. Complexities 1 and 2 correspond to linear and radial gradients and the 3 or 6 coordinates are a transform matrix as per FFV 0. Complexities 0 and 3 are reserved.
For gradients, the next flag 2 bits (bits 9 ..= 10) give the gradient spread: none, pad, reflect, repeat.
The remaining 21 bits are reserved.

TBD: complexity 0 might be repurposed for hit-testing: filling rough paths with multiple invisible-but-different colors.

The next 4 opcodes specify control flow

The first three are followed by a natural number J and possibly further arguments.

0xE0 means to jump past the next J instructions, unconditionally. Instructions are atomic (you can't jump to the middle of a multi-byte instruction), which means that all opcodes need to know the byte-size of their 'opcode arguments' even if they are reserved opcodes.
0xE1 means to jump past the next J instructions if the decoder does not support all of a bitmask F of IconVG features, where F is a natural number 'opcode argument' encoded after J. There are no feature bits currently assigned but future expansions may use them.
0xE2 means to jump past the next J instructions if the decoder's height in pixels is outside of the half-open interval LOD0 .. LOD1, where these two natural numbers are 'opcode arguments' encoded after J.
0xE3 is a 'return'. If we're in a macro (see below), return to the caller. If we're not in a macro, this signifies the end of the graphic (even if we're not at the end of the file).

It's invalid to jump past the end of the file or macro segment.

The next 4 opcodes specify sub-routines

They are typically followed by an 8 byte FileSegment (40 bit file offset, 24 bit file length) and possibly further arguments.

0xE4 means a macro expansion (like a C #include line, but given a number-pair FileSegment instead of a string filename).
0xE5 is like 0xE4 but the FileSegment is followed by 6 coordinates that form an affine transform matrix to apply to the geometry and transform matrices within that macro.
0xE6 is like 0xE4 but the FileSegment is interpreted as separate 'scripting bytecode' (more to say about that in a future post) instead of the IconVG bytecode detailed in this comment.
0xE7 has a natural number N opcode argument instead of an 8 byte FileSegment. The next N bytes is a script, like 0xE6 but inline.

The macro opcodes 0xE4 and 0xE5 are invalid when already in a macro expansion. No recursion allowed.

All four opcodes are a single instruction for "jump past the next J instructions" accounting.

The last 24 opcodes are reserved

0xE8 ..= 0xFF are followed by a natural number N and then N bytes of 'opcode arguments'.

nigeltao commented 3 years ago

More thoughts...

FileSegments

FileSegments are tweaked. There's a uint64le flavor (an "Absolute FileSegment"):

High bit indicates a redirect (see below).
Middle-high 31 bits are a SegmentOffset (an absolute offset from the start of the file).
Middle-low 24 bits are SegmentLength.
Low 8 bits encode SegmentType. 0x00 means that the segment holds IconVG bytecode. 0x01 means 'scripts' (e.g. animation) although the scripting language details are still TBD. Other values are reserved.

There's also a uint32le flavor (an "Inline FileSegment"), just the low 32 bits. There are no redirects and the SegmentOffset is implicit: it immediately follows the uint32le.

IconVG files can be larger than 2 GiB. The redirect bit being set on an Absolute FileSegment means that the 31+24=55 middle bits are a file offset for another 16 bytes: uint64le SegmentOffset and uint64le SegmentLength.

Opcodes

54 Path Geometry Opcodes

0x00 ..= 0x0F encodes LineTo. There are (1 * RL) coordinate pairs.
0x10 ..= 0x1F encodes QuadTo. There are (2 * RL) coordinate pairs.
0x20 ..= 0x2F encodes CubeTo. There are (3 * RL) coordinate pairs.
0x30 ..= 0x33 encode a quarter-, half-, three-quarter- or full-ellipse. There are 2 coordinate pairs.
0x34 encodes a parallelogram. There are 2 coordinate pairs.
0x35 encodes a MoveTo (closing any in-progress path first). There is 1 coordinate pair.

Processing the ellipse or parallelogram opcodes requires knowing the 'current point' to start from, also known as the 'pen location'. See Three Points (Two Opposing) Define an Ellipse. For example, after 5 consecutive CubeTo operations, the current point is set to the last of the 15 coordinate pairs (5 * 3 = 15).

2 Miscellaneous Opcodes

0x36 is followed by a 1-byte DELTA value and means SEL += DELTA, modulo 64.
0x37 is a no-op.

4 Jump / Return Opcodes

The first three are followed by a natural number J and possibly further arguments.

0x38 means to jump past the next J instructions, unconditionally. Instructions are atomic (you can't jump to the middle of a multi-byte instruction), which means that all opcodes need to know the byte-size of their 'opcode arguments' even if they are reserved opcodes.
0x39 means to jump past the next J instructions if the decoder does not support all of a bitmask F of IconVG features, where F is a natural number 'opcode argument' encoded after J. Scripts (SegmentType 0x01) require feature bit 0x0000_0001. Other feature bits are reserved for future expansions.
0x3A means to jump past the next J instructions if the decoder's height in pixels is outside of the half-open interval LOD0 .. LOD1, where these two natural numbers are 'opcode arguments' encoded after J.
0x3B is a 'return'. If we're in a sub-routine (see below), return to the caller. If we're not in a sub-routine, this signifies the end of the graphic (even if we're not at the end of the file or sub-routine FileSegment).

It's invalid to jump past the end of the file or sub-routine FileSegment.

4 Call Sub-routine Opcodes

0x3C call sans-ATM Inline FileSegment
0x3D call with-ATM Inline FileSegment
0x3E call sans-ATM Absolute FileSegment
0x3F call with-ATM Absolute FileSegment

If the opcode 0x01 bit is set, this is followed an ATM (alpha and transform matrix). An ATM is a 1-byte alpha value and then a 3x2 affine transform matrix (each number encoded as if it was a coordinate) to apply (multiply) to the paints, geometry and transform matrices within that sub-routine. 'No ATM' is equivalent to an 0xFF alpha and identity transform matrix.

The ATM (or lack of it) is followed by a 4 byte Inline FileSegment (e.g. 'switch to scripting mode') or 8 byte Absolute FileSegment (e.g. 're-use shared paths and fills', 're-use shared scripts'), depending on the opcode 0x02 bit being off or on. An Inline FileSegment is followed by SegmentLength bytes.

These four opcodes are only valid when executing 'at the top level'. They're invalid if encountered when already in a sub-routine call.

64 Set Register Opcodes

64 ring-stack registers REGS, 64 bits each, and one SEL selector register. It's like the earlier comment in this issue, except the stack now grows downwards. A stack push decrements (not increments) SEL. ADJ adjustments are added (not subtracted).

0x40 ..= 0x4F sets the low 32 bits of a single register (the high bits are zeroed).
0x50 ..= 0x5F sets the high 32 bits of a single register (the low bits are zeroed).
0x60 ..= 0x6F sets all 64 bits of a single register.
0x70 ..= 0x7F sets all 64 bits of multiple registers.

For the first 48 opcodes, the low 4 bits give an ADJ value. These opcodes write to REGS[(SEL+ADJ)&63]. It also post-decrements SEL when ADJ is zero.

For the last 16 opcodes, let LENGTH equal 2 plus the opcode's low 4 bits. They pre-decrement SEL by LENGTH and then consume LENGTH uint64le values, storing them in REGS[SEL+1], REGS[SEL+2], ..., REGS[SEL+LENGTH], in that order.

"Sets the low/high 32 bits" means that the opcode is followed by a uint32le number to put in the corresponding low/high half of the REGS element (the other half is zeroed). "Sets all bits" means that the opcode is followed by one (opcodes 0x60 ..= 0x6F) or more (opcodes 0x70 ..= 0x7F) uint64le numbers.

Low 32 bits are interpreted as unsigned 16.16 fixed point when used as gradient stops (e.g. 0xC000 represents a gradient stop offset of 0.75). Future expansions may interpret the bits in other ways.

High 32 bits are intepreted as alpha-premultiplied RGBA colors. Alpha less than any of Red, Green or Blue has special meaning, as they would otherwise be invalid alpha-premultiplied colors. That special meaning is either a blend (Alpha is zero) or a 'discriminated transparent black' (Alpha is non-zero).

A blend is what FFV0 calls a 3-byte indirect color. G and B give 1-byte colors SRC0 and SRC1 and R is the BLEND (0x00 means all-SRC0, 0xFF means all-SRC1):

RESULTANT.RED = (((255-BLEND) * SRC0.RED) + (BLEND * SRC1.RED) + 128) / 255
Ditto for GREEN, BLUE and ALPHA

1-byte colors are similar to but tweaked from FFV0. 0x00, 0x01 and 0x02 mean RGBA values 00:00:00:00, 80:80:80:80 and C0:C0:C0:C0. 0x03 ..= 0x7F mean base-5 opaque colors. 0x80 ..= 0xBF mean from the custom palette. 0xC0 ..= 0xFF (call the value c) takes the color from REGS[(SEL+c)&63].

A 'discriminated transparent black' means that the paint is a no-op, in terms of modifying pixel colors, but having multiple 'transparent black' values can be useful for hit-testing: this shape is 'transparent black number 1', this other shape is 'transparent black number 2', etc.

SEL is initially set to 56, allowing easy read access to registers 0..=7 (initialized from the custom palette if given) and easy read/write access to registers 57..=63 (typically 'scratch' space).

64 Fill Opcodes

The opcode's low 4 bits give an ADJ value. These opcodes read from REGS[(SEL+ADJ)&63]. Gradients also read from later REGS, per the number of gradient stops. It also pre-increments SEL when ADJ is zero.

0x80 ..= 0x8F fills with a flat color.
0x90 ..= 0x9F fills with a linear gradient. It is followed by a one byte GRADIENT_ARGS and then a 3x1 matrix, per 'complexity' in the earlier comment.
0xA0 ..= 0xAF fills with a radial gradient. It is followed by a one byte GRADIENT_ARGS and then a 3x2 matrix, per 'complexity' in the earlier comment.
0xB0 ..= 0xBF is reserved, but the fallback is to fill with a flat color. It is followed by a natural number N and then N extra bytes.

For the GRADIENT_ARGS byte, the low 6 bits give the number of stops minus 2 (and 65 stops is invalid). The high 2 bits give the spread (how to extrapolate color stops outside the 0..1 stop offset nominal range).

64 Reserved Opcodes

0xC0 ..= 0xDF are reserved but the fallback effect is a single LineTo. The opcode byte is followed by a natural number N and then N 'argument bytes'. Those bytes are then followed by a coordinate pair that is the LineTo argument (in the fallback case). If a future expansion employs these opcodes, the semantics are also expected to move the current point to this final coordinate pair.
0xE0 ..= 0xFF are followed by a natural number N and then N bytes of 'opcode arguments'. The fallback effect is a no-op.