hlorenzi / customasm

💻 An assembler for custom, user-defined instruction sets! https://hlorenzi.github.io/customasm/web/
Apache License 2.0
704 stars 55 forks source link

Failure to match when a number is followed by a % #199

Open EnderShadow opened 7 months ago

EnderShadow commented 7 months ago

I have some asm syntax which allows something like this minimal reproducer to occur, but it fails to compile with a no match found error.

#subruledef a
{
    s{x: u4} => x
}

#subruledef b
{
    %r{x: u4} => x
}

#ruledef
{
    {x: a} {y: b} => x @ y
}

s0 %r0

Changing the rule def and instruction to the following lets it compile successfully.

#ruledef
{
    {x: a}, {y: b} => x @ y
}

s0, %r0

The issue still occurs with --debug-no-optimize-matcher

EnderShadow commented 7 months ago

Some additional information based on my testing. If I inline subrule b from above, it properly compiles

EnderShadow commented 7 months ago

An even smaller POC if it matters.

#subruledef b
{
    % => 0x0`0
}

#ruledef
{
    {x: u8} {y: b} => x @ y
}

0 %
hlorenzi commented 7 months ago

Hmm, this might be unsolvable in the current architecture. The reason s0 %r0 fails is that the a subrule s{x: u4} starts parsing an expression after the s token, and the remaining 0 %r0 looks like a valid expression syntactically (using the modulo operator % on the literal 0 an the variable r0). To avoid any parser complexity, expression parsing is greedy. The solution, as you mention, is to introduce a separator token that looks invalid in an expression, such as a comma.

There is an exception, however, in the case of specifying everything in a single rule as s{x: u4} %r{y: u4}, where the parser will look ahead, and expression parsing can stop early. This means that extracting parameters into their own named subrules isn't exactly orthogonal in behavior, and the most powerful option is usually to specify everything monolithically. This can be annoying and might be worth improving in the future.