The compilation of a format that ends with a union or a repeat will depend on the format that follows it, as this may influence the match tree used for lookahead, so initially we compiled each format into multiple decoders, one for each possible "next".
This pull request compiles each format to a single decoder instead, taking the union of all the "nexts". I think this is sound: if it's valid for F to be followed by A and valid for F to be followed by B then it should be valid for F to be followed by (A|B).
It's nice to create exactly one decoder per format however this still requires "whole program analysis" in the sense that a format cannot be compiled independently of how it is used, as you would hope a function or module could be.
Also the code feels slightly fragile given the way it has some subtle invariants on the decoder indices, that could probably be improved a little.
The compilation of a format that ends with a union or a repeat will depend on the format that follows it, as this may influence the match tree used for lookahead, so initially we compiled each format into multiple decoders, one for each possible "next".
This pull request compiles each format to a single decoder instead, taking the union of all the "nexts". I think this is sound: if it's valid for F to be followed by A and valid for F to be followed by B then it should be valid for F to be followed by (A|B).
It's nice to create exactly one decoder per format however this still requires "whole program analysis" in the sense that a format cannot be compiled independently of how it is used, as you would hope a function or module could be.
Also the code feels slightly fragile given the way it has some subtle invariants on the decoder indices, that could probably be improved a little.