google / iconvg

IconVG is a compact, binary format for simple vector graphics: icons, logos, glyphs and emoji.
Apache License 2.0
677 stars 11 forks source link

Proposal: File Format Versions 1, 2 and Beyond #4

Open nigeltao opened 3 years ago

nigeltao commented 3 years ago

Summary

I propose to:

Background

Since its inception in 2016, IconVG has always carried the caveat that "WARNING: THIS FORMAT IS EXPERIMENTAL AND SUBJECT TO INCOMPATIBLE CHANGES".

Issue #2 in this repository is about adding animation to IconVG graphics. Tweening would almost certainly involve transformations (in the "affine transformation" sense) and interpolation.

The original IconVG design took the entirety of the SVG path model, including elliptical arc segments. Unlike line_to, quad_to and cube_to, arc_to's parameterization is unique, not being a sequence of (x, y) coordinate pairs, and a boolean argument like large-arc-flag is impossible to interpolate smoothly.

Rasterization backends like Cairo and Skia also don't provide arc_to as a primitive, or if they do, not in the way that SVG parameterizes it. We usually approximate arcs as cubic splines.

Also recall that IconVG is a presentation format, not an authoring format, and it already isn't able to represent groups, strokes, text, etc 'natively'. Authoring tools like Illustrator or Inkscape, if they could export to IconVG, are expected to 'lower' e.g. stroked paths to more primitive operations (filled paths), the same way that they would 'flatten' layers if exporting to PNG. I'd expect such tools could also 'lower' arcs to cubic Béziers during export.

Thus, I'm considering removing arcs from the file format. This new version (File Format Version 1) would not be a superset of FFV 0 per se, but FFV 0 files could be converted in a straightforward way and the rasterizations would be equivalent. In essence, 'lowering' arcs becomes the responsibility of the authoring tools (which get more complicated) instead of the presentation tools (which get simpler).

Separately, the original Go implementation (the golang.org/x/exp/shiny/iconvg package in a separate repository) was released as an interim milestone of the unfinished 'Shiny' Go GUI project. IconVG hasn't had much adoption so far, as the only implementation was in Go and so not usable from e.g. C++, Dart or Python GUI programs. In recent weeks, this repository has gained a brand new C implementation, but we still don't yet have a vast back-catalogue of existing IconVG files to constrain us.

Bringing all of the above together, if I were ever to make an IconVG FFV 1, especially one that isn't a superset of FFV 0 (because arcs), then now is the time to do it.

This issue is a place to discuss that process and what other features to add or warts to remove as part of FFV 1.

File Format Changes

See the spec for context.

The major change is:

Minor clean-up changes are:

Implementations

Notably, any existing Go code (using the 'old' Go library) displaying existing (FFV 0) files will continue to work.

Timeline

FFV 1 should be finalized 'soon'. FFV 2 is more open ended and will require extensive prototyping.

Hixie commented 3 years ago

I would discourage the use of version numbers. They prevent a format from being forward-compatible. Better to define error handling behaviour for all possible error conditions (unknown op codes, etc) and then add features in a backwards-compatible manner, IMHO.

Hixie commented 3 years ago

(Breaking with FFV0 is fine, I'm just saying to avoid FFV2 being incompatible with FFV1. Consider for example how animated GIFs fall back to non-animated GIFs in legacy software, or how APNG is just PNG with extra data, so it similarly falls back to a non-animated version in legacy software, etc. Most successful formats follow this pattern.)

nigeltao commented 3 years ago

The intention is for FFV 2 to be a superset of FFV 1. It will just define new opcodes and new metadata chunks. I think it's perfectly feasible for FFV 1 decoders to simply ignore opcodes and metadata that it does not recognize.

I agree that the "APNG falls back to PNG" model is worth mimicking. I still think that's it's potentially useful to be able to distinguish FFVs 1 and 2. For example, the memory allocation requirements for an animated WxH graphic will be higher than for a static WxH graphic. Some decoders might wish to do all their allocation up front (after decoding Width and Height, available early in the decode process). They might like to know (potential) animated-ness just from parsing a few opening bytes rather than having to go arbitrarily deep into the file.

Metadata chunks already explicitly list the chunk length in bytes. We'd have to ensure that any new (FFV 2+) opcodes also do so (so that you can skip over them). Thanks for the feedback.

nigeltao commented 3 years ago

FFVs 1 and 2 can equivalently be considered Small and Large profiles of the overall IconVG file format.

I forgot to mention... another distinction will be that FFV 1 only requires sequential access (not random access) to the IconVG file. Again, that might be useful to know up front rather than having to go arbitrarily deep into the file.

Hixie commented 3 years ago

Based on my recent experience implementing the spec in Dart, I have the following opinions:

Remove the A and a arc-related drawing opcodes.

No objection.

Change the first byte of the magic header from 0x89 to 0x8A, so that we can distinguish IconVG from PNG (from JPEG from WebP etc) just from the first byte of the file. https://en.wikipedia.org/wiki/List_of_file_signatures doesn't show any previous claims on 0x8A.

No objection, but anyone who is only looking at the first byte is doing themselves and their users a disservice. See also: https://mimesniff.spec.whatwg.org/

Add an explicit FFV number in the wire format. Specifically, change the fourth byte of the magic header from 0x47 (ASCII 'G') to 0x31 (ASCII '1') for FFV 1, 0x32 (ASCII '2') for FFV 2, etc.

I would recommend against this, as discussed above.

MID numbers must use the shortest possible encoding.

I would recommend against this, as it makes implementations more complicated and does not seem to solve any immediate issues. If the concern is being able to read the metadata section without a full decoder, parsing the metadata section is already pretty trivial. I don't think it's worth making that use case simpler at the cost of making a full decoder more complicated (since now it would need yet another way to decode numbers, this one just for metadata blocks).

Re-number the ViewBox and Suggested Palette MIDs (Metadata IDs) from 0 and 1 to 8 and 16 (which are represented on the wire as 0x10 and 0x20). Since metadata is presented in increasing MID order, the gaps allow future extensions to insert (optional) metadata chunks before these existing ones.

No objection. I'm not really sure why the order is required here though. The only benefit I see is that it makes catching duplicates more easy, but in practice I found it useful to have out-of-band flags for both of the existing metadata blocks anyway (viewBox because in languages with write-once-only fields you only want to write set the viewBox fields once so the default is set after reading metadata, not before; palette because I wanted to avoid copying into CREG if I didn't see a custom palette).

Prohibit encoded real numbers being NaNs. In the end state, the spec should no longer mention "undefined behavior".

No objection. Would you also prohibit +/- infinity?

Tighten restrictions on gradients: there must be at least two stops and the offsets must span from 0 to 1 inclusive.

No objection.

If I could be allowed to make some suggestions of my own:

nigeltao commented 3 years ago

I'm not really sure why the order is required here though.

Having metadata chunks appear in strictly-increasing MID order means that I can guarantee that e.g. the ViewBox (MID=8) chunk is in the first N bytes for some value of N, if it's present at all. That's assuming that every earlier chunk (lower MID) has an upper bound on how long it can be.

It's not a must-have feature, but I think it's not onerous and it might be nice to be able to say "if you can give me the first 128 bytes of the IconVG file than I can definitely tell you its (explicit or implicit) viewbox".

Would you also prohibit +/- infinity?

LOD1 can meaningfully be set to +infinity, although I suppose 1e9 would be equivalent in practice.

Write the spec using RFC2119 language... I'd be happy to lend an editor's hand here if you would like.

I'd be happy to have your editor's hand... but I think that'd go best if you went to work after the spec gets 'upgraded' to at least FFV 1.

nigeltao commented 3 years ago

Remove the A and a arc-related drawing opcodes.

For the record, the https://github.com/google/iconvg/issues/18 thread also discusses dropping the smooth ops S/s/T/t and/or the relative ops l/t/q/s/c/a/m/h/v.

Hixie commented 3 years ago

at's assuming that every earlier chunk (lower MID) has an upper bound on how long it can be.

That's only true currently because it's MID 0, right? I don't see anything in the format that would prevent unknown metadata blocks from being arbitrarily large.

nigeltao commented 3 years ago

Preliminary thoughts on FFV 2. Very preliminary.

Collections

Let a single .ivg file can contain multiple graphics. Users can open individual ones by name, e.g. "device/battery50" from material_icons.ivg.

MID 0 (in FFV 1 MID numbering) holds a map (wire format TBD) from string to FileSegment. FileSegment is a uint64le, packing a 40-bit file offset and a 24-bit segment length. In this context that FileSegment holds a (non-Collection) 'headless' IconVG graphic. Headless means that it skips the 4-byte magic header.

Palette Names, Parameter Names

A new (optional) metadata chunk that gives names to CREG indices. For example, 0:"skin", 1:"hair", etc. Might include the reverse map too: {"hair": 1, "skin": 0}.

Also have another metadata chunk (call it "Parameter Names") that does this for NREG instead of CREG. For example, NREG[32] could conventionally be called t, an animation time parameter.

Also allow Suggested and Custom Parameters, which do to NREG what the Suggested and Custom Palette do to CREG.

The (human-readable) names are for use 'externally', by implementations or libraries that consume IconVG. 'Within' the IconVG itself, things are identified by an integer ID or by a FileSegment.

Hit Testing

Answers "what part of the graphic did I just click on"?

Add a new 6-bit HITTEST register and a new styling opcode to copy NSEL to HITTEST. Current value of HITTEST is passed to callbacks when exiting drawing mode (i.e. filling a path), augmenting the paint attributes (RGBA flat color, gradient, etc) that's already passed at the same time.

Compound Graphics

Let multiple graphics (within a single file) share common elements. Allow a collection to hold "foo" and "foo-with-bar-badge" graphics. Allow a collection to hold "qux_en", "qux_de", "qux_zh_Hant_HK" graphics that re-use a base "qux" graphic. Note that text in general is out of scope due to its enormous complexity. Authors/tools are expected to 'flatten' the "en", "de" etc glyphs as simple paths.

New opcode (or opcodes?) in styling mode to do a 'function call': play another headless IconVG graphic, again identified by a u40;u24 FileSegment. The 'function call' also specifies a scalar GA (global alpha) and a 6-elem GTM (global transformation matrix) to apply to the callee, taken from NREG[NSEL-7 .. NSEL-0]. The [i..j] syntax here means an inclusive-low exclusive-high range. These 7*float32 globals are 'popped' when the 'call' returns, like a SkCanvas save/restore pair. TBD: something something clip too?

The callee has their own register state (CREG, NREG, etc) which is copied from the caller (possibly 'rotated' by the caller's CSEL/NSEL so that the caller's NREG[NSEL+i] becomes the callee's NREG[i]) on 'function call entry' but not copied back on 'function call exit'. The max recursion depth is TBD, but finite, explicit, and probably small (around 2-4, maybe even 1 and authoring tools are expected to inline deeper calls??).

Transformation Matrix Support

Styling opcodes to manipulate NREG[NSEL-6 .. NSEL-0] as an affine transformation matrix (called 'self'):

Reserved Opcodes

These need to encode "skip the next N bytes if the (older) implementation doesn't support this opcode" somehow. This might be on a per-opcode basis, or perhaps an overall "SkipLT(N, V)" opcode to skip the next N bytes if the library doesn't support File Format Version V.

For drawing opcodes, we might also need to say whether unsupported opcodes should be replaced by line_to(x, y) or move_to(x, y) or a no-op. Or maybe a "SkipGE(N, V)" opcode, like "SkipLT(N, V)" but >= V instead of < V.

Or maybe a single "IfElseV(M, N, V)" opcode that:

Control Flow Opcodes

Add "JumpXX(N)" opcodes to skip the next N bytes if NREG[NSEL] XX NREG[CSEL], where XX are comparison operators: equal, not-equal, less-than, less-equal, etc.

Add an explicit "Return" opcode?? EOF (End-of-File) or End-of-FileSegment is still end of graphic. Might not be necessary if equivalent to an (unconditional?) jump to the end.

Arithmetic Opcodes

Crazy (??) idea: just embed an eBPF interpreter (constrained similar to what the Linux kernel does, e.g. runtime verification of no backwards branches) and let authors/tools write their own ease-in ease-out curves or generally go wild. One complication is that IconVG speaks float32 and eBPF speaks uint64. Perhaps have support (built-in 'syscalls') to convert between float32 and 48.16 fixed point??

Tweening

Like the 'function call' opcode, but with twice the number of args (TBD: is "twice" necessary if matrix lerping can be done by Transformation Matrix Support and Arithmetic Opcodes??). The two separate graphics are tweened according to a zero-to-one blend argument (in NREG[NSEL-15]??):

Animation

Animation comes from combining almost all of the above. User program passes t and other parameters (e.g. if various UI buttons are clicked), various FileSegment sub-graphics are programatically transformed, composed, tweened or skipped.

We might also need new metadata chunk for animation length and loopiness.

The following is hand-wavy, but the intention is for 'leaf nodes' (which don't make 'function calls', they're just a filled path) to be 'compilable' / uploadable to GPU-friendly formats and uniquely identified by their uint64 FileSegment. That compilation happens once, not once per animation frame. Rendering the scene at time t involves re-computing the alpha and transform for each leaf. This happens on the CPU, especially if eBPF is involved, but e.g. the pre-transformed geometry that was previously uploaded to the GPU stays unchanged.

Consider restricting nodes to hold either 'function call' ops or 'drawing mode' ops but not both: nodes are either a (pure) branch or a leaf.

TBD / Punted to FFV 3??

Still Out Of Scope

nigeltao commented 3 years ago

That's only true currently because it's MID 0, right? I don't see anything in the format that would prevent unknown metadata blocks from being arbitrarily large.

If it's MID 8, we could constrain every earlier MID to be e.g. at most 16 bytes long, which should be enough for a redirect-pointer if necessary.

Hixie commented 3 years ago

Preliminary thoughts on FFV 2. Very preliminary.

It's hard for me to provide feedback on these because I don't know what the problem domain is. I'm guessing from the list of features that it's substantially different from FFV0's problem domain, which seemed to be "format to allow the material icons to be rendered faithfully at any size from tiny files" (which explained the custom palette, the set of drawing features, gradients as a primitive, and the focus on small file sizes).

nigeltao commented 3 years ago

It's hard for me to provide feedback on these because I don't know what the problem domain is.

It's my attempt at solving https://github.com/flutter/flutter/issues/1831 and if I understand correctly, FFV 0 / FFV 1 isn't feature-rich enough (e.g. animation).

Hixie commented 3 years ago

I should make my work-in-progress doc for that effort public, but I think you may have seen it. It lists some of the criteria for what such a format would need to address. One of the highlights which seems relevant here is that the top priority is render speed, with file size being somewhat low on the list; ideally one should be able to get relatively close to just copying significant chunks of the raw data into a shader to draw most of the image. I don't know if the opcode-based approach of IconVG can achieve that.

Hixie commented 3 years ago

(By which I mean I literally don't know. There's an effort underway to provide arbitrary SPIR-V shader support for Flutter, and once that is landed I hope to experiment with it and see what kind of vector graphics renderer one can build directly into a shader.)

nigeltao commented 3 years ago

Add "JumpXX(N)" opcodes to skip the next N bytes if NREG[NSEL] XX NREG[CSEL], where XX are comparison operators: equal, not-equal, less-than, less-equal, etc.

Some more thinking out loud: if we had a "JumpLOD(H0, H1, N)" opcode, that skipped the next N bytes if the height-in-pixels H was outside the H0..H1 range, then we wouldn't need the LOD registers. Skipping N bytes in one motion would also be simpler and faster than decoding one opcode at a time until we're back in LOD range.

Or maybe we add the JumpXX opcodes and also another one to set NREG[NSEL] = height_in_pixels...

BigBadaboom commented 3 years ago

Typo spotted?

8 and 16 (which are represented on the wire as 0x10 and 0x20)

Should be 0x08 and 0x10?

nigeltao commented 3 years ago

Should be 0x08 and 0x10?

No, it's 0x10 and 0x20. MIDs are encoded as Natural Numbers and the IconVG spec says "For a 1 byte encoding, the remaining 7 bits form an integer value in the range [0, 1<<7). For example, 0x28 encodes the value 0x14 or, in decimal, 20".

nigeltao commented 3 years ago

Another update summarizing my current thinking, in case anyone's interested.

Goals

I still like the "mission statement" at the top of the main README file. "A compact, binary format for simple vector graphics: icons, logos, glyphs and emoji." Longer term, maybe animation or security would also gain an explicit mention.

Compactness is a goal, but it's not the only goal. The aim isn't compactness at any cost. "Just use gzipped SVG" might be competitive in terms of compactness, but a very different story from a security and implementation complexity perspective.

Simplicity is also a goal, but again, it's not the only goal. There's usually also a trade-off between simplicity and feature richness.

Changes

Dropping features from FFV 0

Future-proofing

New features

lifthrasiir commented 3 years ago

gain an opcode for "jump past the next N bytes".

Unless carefully specified, this would mean that jumping to the middle of other operation is possible. I don't think this is desirable for a number of reasons including the security implication. I expect that the parsing cost is not very high, so I think this should be "decode but ignore next N instructions" instead.

nigeltao commented 3 years ago

"decode but ignore next N instructions"

Well, this requires deciding how long (in bytes) each reserved opcode will be. Specifying that today could be awkward if we want to eventually have some sort of scripting or general computing (to support animation), but we haven't concluded yet how that'll be implemented or represented on the wire.

nigeltao commented 3 years ago

Some more thinking out loud...

The way that gradients are encoded in the unused parts of alpha-premultiplied RGBA space is clever. But somebody (I forget who) once told me that a difference between programming and software engineering is whether "clever" is a compliment or a pejorative.

I can't find the link, but I do remember @Hixie saying at some point that this cleverness makes it hard, in the future, if we want to add different sorts of paints. For example, blend modes (color dodge), effects (blurs) or something something hit-testing.

@lifthrasiir also made the point in #31 that a lot of a gradient's description could be "opcode arguments" instead of being cleverly squeezed into the CREGs.

Perhaps we should split the paint ops (what's currently 0xE1 "exit drawing mode", but would probably be renamed as "fill" if we no longer have two modes) into two classes:

  1. "basic paint" with a flat color: no explicit args, but use CREG[CSEL]
  2. "special paint" (special = gradient for now, maybe others later): various explicit opcode args (number of stops, linear/radial, spread, some hand-wavy future-expansion capability) with stop color/offsets taken from CREG[CSEL-NSTOPS .. CSEL] and NREG[NSEL-NSTOPS .. NSEL].

"Explicit opcode arguments" means that a "special paint" opcode is followed by a number of extra bytes, the way that an L op is followed by extra bytes for coordinate pairs.

Afterwards, CREG only holds flat colors or gradient stop colors. NREG only holds gradient stop offsets. We could make it invalid to set a CREG to something that's not valid alpha-premul.

If we also encourage a 'stack' model per #31, so that assigning CSEL and NSEL specific values become less important than incrementing / decrementing them, then the 128 1-byte "Set CSEL/NSEL" opcodes could collapse to 2 2-byte opcodes, opening up a lot more opcode space...

Overall, changing how gradients are represented would make it a little harder to upgrade FFV 0 to FFV 1 automatically, if the graphic uses gradients, but it's probably still doable.

lifthrasiir commented 3 years ago

Well, this requires deciding how long (in bytes) each reserved opcode will be.

That's true, but it is not much different from putting the length information for any subsequent opcode (unless multiple such opcodes in a run are frequent). I think the "special paint" opcode you've mentioned is a good candidate to include the explicit length for example.


As noted by Hixie in #11, we need to explicitly decide if two unrelated data can overlap or can't. I prefer overlap to be impossible, mainly because it would be easier to control the interpretation than otherwise. If overlap is possible we risk diverging interpretations. Consider the following:

  a view when X is unsupported         a view when X is supported
+--------------------------------+   +--------------------------------+
| jump to P if X is unsupported  |   | jump to P if X is unsupported  |
+--------------------------------+   +--------------------------------+
: (not parsed)                   :   | opcode X                       |
:                                :   +--------------------------------+
:                                :   | arguments to X                 |
:                                :   |                                |
+--------------------------------< P >                                |
| opcode Y                       |   |                                |
+--------------------------------+   |                                |
| arguments to Y                 |   +--------------------------------+
|                                |   | opcode Z                       |
|                                |   +--------------------------------+

The opcode Y is overlapping with arguments to X in this example, and this desynchronization can result in wildly different interpretations or (more usually) an invalid image only when X is supported. Ideally we want this situation to be impossible at all. One alternative is the following:

+--------------------------------+
| opcode X (FFV 2)               |
| +----------------------------+ |
| | length of arguments to X   |----+
| +----------------------------+ |  |
| | arguments to X             | |  | byte length
| |                            | |  |
| |                            |<---+
| |                            | |
| +----------------------------+ |
+--------------------------------+
| ignore K opcodes if X is       |--+
| unsupported                    |  |
+--------------------------------+\ | # opcodes
| opcode Y and arguments (FFV 1) | \|
+--------------------------------+  +
| opcode Z and arguments (FFV 1) | /
+--------------------------------+/

All implementations since FFV 1 can determine the entire structure, but only those supporting X can execute the opcode X. No byte can be interpreted in multiple ways. This is not the only way to do that, but it seems that encoding the length of arguments right into all future opcodes is necessary.

Hixie commented 3 years ago

Another way to do this would be to split the opcode space by number of arguments, For example, Opcodes 0x00 .. 0x1F have zero arguments, opcodes 0x20 .. 0x7F have 6 arguments, opcodes 0x80 .. 0xDF have 12 arguments, opcodes 0xE0 .. 0xFF have 16 arguments. Or whatever. Or equivalently, opcodes could be two bytes long, with one byte always coincidentally giving the length of arguments. The point is that you decouple the parsing from the interpreting, so that parsing is future-proof.

nigeltao commented 3 years ago

Some more thoughts. They're not final, I just want to write down some ideas-in-progress before I forget.

Ring-Stack Registers

The first 128 opcodes (4+2+1 bits) set REG values

The next 54 (48 + 6) opcodes specify path geometry

The low 4 bits form a number RL0. If RL0 is zero then a natural number RL1 follows (in 1, 2 or 4 bytes) and the run length RL is set to (RL1 + 16). If RL0 is non-zero then RL is set to RL0. After the opcode (and after RL1 if present) are one or more absolute (not relative) coordinate pairs (a pair is (x, y)). Each coordinate is encoded in 1, 2 or 4 bytes the same as FFV 0.

Processing the ellipse or parallelogram opcodes requires knowing the 'current point' to start from, also known as the 'pen location'. This is just the last coordinate pair of a LineTo, QuadTo, CubeTo or MoveTo op. For example, after 5 consecutive CubeTo operations, the current point is set to the last of the 15 coordinate pairs (5 * 3 = 15).

The next 10 opcodes are miscellaneous / reserved

The next 32 opcodes specify path fills

Fills close any in-progress path.

TBD: complexity 0 might be repurposed for hit-testing: filling rough paths with multiple invisible-but-different colors.

The next 4 opcodes specify control flow

The first three are followed by a natural number J and possibly further arguments.

It's invalid to jump past the end of the file or macro segment.

The next 4 opcodes specify sub-routines

They are typically followed by an 8 byte FileSegment (40 bit file offset, 24 bit file length) and possibly further arguments.

The macro opcodes 0xE4 and 0xE5 are invalid when already in a macro expansion. No recursion allowed.

All four opcodes are a single instruction for "jump past the next J instructions" accounting.

The last 24 opcodes are reserved

nigeltao commented 3 years ago

More thoughts...

FileSegments

FileSegments are tweaked. There's a uint64le flavor (an "Absolute FileSegment"):

There's also a uint32le flavor (an "Inline FileSegment"), just the low 32 bits. There are no redirects and the SegmentOffset is implicit: it immediately follows the uint32le.

IconVG files can be larger than 2 GiB. The redirect bit being set on an Absolute FileSegment means that the 31+24=55 middle bits are a file offset for another 16 bytes: uint64le SegmentOffset and uint64le SegmentLength.

Opcodes

54 Path Geometry Opcodes

The low 4 bits form a number RL0. If RL0 is zero then a natural number RL1 follows (in 1, 2 or 4 bytes) and the run length RL is set to (RL1 + 16). If RL0 is non-zero then RL is set to RL0. After the opcode (and after RL1 if present) are one or more absolute (not relative) coordinate pairs (a pair is (x, y)). Each coordinate is encoded in 1, 2 or 4 bytes the same as FFV 0 (tweaked by #33).

Processing the ellipse or parallelogram opcodes requires knowing the 'current point' to start from, also known as the 'pen location'. See Three Points (Two Opposing) Define an Ellipse. For example, after 5 consecutive CubeTo operations, the current point is set to the last of the 15 coordinate pairs (5 * 3 = 15).

2 Miscellaneous Opcodes

4 Jump / Return Opcodes

The first three are followed by a natural number J and possibly further arguments.

It's invalid to jump past the end of the file or sub-routine FileSegment.

4 Call Sub-routine Opcodes

If the opcode 0x01 bit is set, this is followed an ATM (alpha and transform matrix). An ATM is a 1-byte alpha value and then a 3x2 affine transform matrix (each number encoded as if it was a coordinate) to apply (multiply) to the paints, geometry and transform matrices within that sub-routine. 'No ATM' is equivalent to an 0xFF alpha and identity transform matrix.

The ATM (or lack of it) is followed by a 4 byte Inline FileSegment (e.g. 'switch to scripting mode') or 8 byte Absolute FileSegment (e.g. 're-use shared paths and fills', 're-use shared scripts'), depending on the opcode 0x02 bit being off or on. An Inline FileSegment is followed by SegmentLength bytes.

These four opcodes are only valid when executing 'at the top level'. They're invalid if encountered when already in a sub-routine call.

64 Set Register Opcodes

64 ring-stack registers REGS, 64 bits each, and one SEL selector register. It's like the earlier comment in this issue, except the stack now grows downwards. A stack push decrements (not increments) SEL. ADJ adjustments are added (not subtracted).

For the first 48 opcodes, the low 4 bits give an ADJ value. These opcodes write to REGS[(SEL+ADJ)&63]. It also post-decrements SEL when ADJ is zero.

For the last 16 opcodes, let LENGTH equal 2 plus the opcode's low 4 bits. They pre-decrement SEL by LENGTH and then consume LENGTH uint64le values, storing them in REGS[SEL+1], REGS[SEL+2], ..., REGS[SEL+LENGTH], in that order.

"Sets the low/high 32 bits" means that the opcode is followed by a uint32le number to put in the corresponding low/high half of the REGS element (the other half is zeroed). "Sets all bits" means that the opcode is followed by one (opcodes 0x60 ..= 0x6F) or more (opcodes 0x70 ..= 0x7F) uint64le numbers.

Low 32 bits are interpreted as unsigned 16.16 fixed point when used as gradient stops (e.g. 0xC000 represents a gradient stop offset of 0.75). Future expansions may interpret the bits in other ways.

High 32 bits are intepreted as alpha-premultiplied RGBA colors. Alpha less than any of Red, Green or Blue has special meaning, as they would otherwise be invalid alpha-premultiplied colors. That special meaning is either a blend (Alpha is zero) or a 'discriminated transparent black' (Alpha is non-zero).

A blend is what FFV0 calls a 3-byte indirect color. G and B give 1-byte colors SRC0 and SRC1 and R is the BLEND (0x00 means all-SRC0, 0xFF means all-SRC1):

RESULTANT.RED = (((255-BLEND) * SRC0.RED) + (BLEND * SRC1.RED) + 128) / 255
Ditto for GREEN, BLUE and ALPHA

1-byte colors are similar to but tweaked from FFV0. 0x00, 0x01 and 0x02 mean RGBA values 00:00:00:00, 80:80:80:80 and C0:C0:C0:C0. 0x03 ..= 0x7F mean base-5 opaque colors. 0x80 ..= 0xBF mean from the custom palette. 0xC0 ..= 0xFF (call the value c) takes the color from REGS[(SEL+c)&63].

A 'discriminated transparent black' means that the paint is a no-op, in terms of modifying pixel colors, but having multiple 'transparent black' values can be useful for hit-testing: this shape is 'transparent black number 1', this other shape is 'transparent black number 2', etc.

SEL is initially set to 56, allowing easy read access to registers 0..=7 (initialized from the custom palette if given) and easy read/write access to registers 57..=63 (typically 'scratch' space).

64 Fill Opcodes

The opcode's low 4 bits give an ADJ value. These opcodes read from REGS[(SEL+ADJ)&63]. Gradients also read from later REGS, per the number of gradient stops. It also pre-increments SEL when ADJ is zero.

For the GRADIENT_ARGS byte, the low 6 bits give the number of stops minus 2 (and 65 stops is invalid). The high 2 bits give the spread (how to extrapolate color stops outside the 0..1 stop offset nominal range).

64 Reserved Opcodes