hlorenzi / customasm

💻 An assembler for custom, user-defined instruction sets! https://hlorenzi.github.io/customasm/web/
Apache License 2.0
719 stars 56 forks source link

Apparently unable to handle parameter without preceeding whitespace #193

Closed soweli-Luna closed 10 months ago

soweli-Luna commented 10 months ago

for a bit of context, the arch i am working with allows you to set the condition mode separately from the opcode, and my plan was to parameterize the mode so that it could be set in an easily readable manner, such that jeq gets parsed as j{mode} where eq is the mode, but customasm is apparently unable to handle parameters without preceding whitespace like this, and passes jeq as the mode for some reason

the relevant portion of code:

#ruledef 
{
    opc {a: register}, {b: register} {mode: ALUmode} {c: register}, {simm11: s11} =>
    {
        0b000 @ a @ mode @ b @ c @ simm11
    }

    b{mode: ALUmode} {b: register}, {c: register}, {simm11: s11} =>   ;example: beq t1, t2, -10
    asm {
        opc zero, {b} {mode} {c}, {simm11}
    }

    j{mode: ALUmode} {a: register}, {b: register}, {addr} =>    ;example: jeq t1, t2, label
    asm {
        b {mode} {a}, {b}, {addr} - $
    }
}

jeq zero, zero, 50

image

MineRobber9000 commented 10 months ago

Minimal test case:

#ruledef register
{
        zero => 15`4
}

#ruledef mode
{
        eq => 15`4
}

#ruledef
{
        b{mode: mode} {register: register} => {
                mode @ register
        }

        j{mode: mode} {register: register} => asm {
                b{mode} {register}
        }
}

beq zero works as expected (FF), but when trying to match jeq zero, it throws an error, amongst other things trying to pass jeq as the mode, which obviously doesn't work. My best guess is that somewhere in the "glued parameter" handling, we forget that the glued token is not, in fact, part of the parameter, and end up trying to pass jeq as a mode to b{mode} when that's clearly not what the author intended.

At first I assumed some token-for-token substitution was at play, like in #172, but assigning to a temporary variable (which fixes that bug) creates Exciting New Errors:

customasm v0.13.4-8-g4c543f5 (2023-12-27, x86_64-unknown-linux-gnu)
assembling `tmp.asm`...
error: failed to resolve instruction
  --> tmp.asm:24:1:
22 | }
23 |
24 | jeq zero
   | ^^^^^^^^
25 |
   + note: within `#anonymous_ruledef_2`, rule 1
    --> tmp.asm:18:2:
  18 |   j{mode: mode} {register: register} => {
     |   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     + note: match attempted: `b:temp zero`
      --> tmp.asm:20:9:
    20 |     asm { b{temp} {register} }
       |           ^^^^^^^^^^^^^^^^^^
       + error: no match found for instruction
        --> tmp.asm:20:9:
      18 |   j{mode: mode} {register: register} => {
      19 |     temp = mode
      20 |     asm { b{temp} {register} }
         |           ^^^^^^^^^^^^^^^^^^
      21 |   }
      22 | }

However, adding jeq to the mode subruledef does, in fact, fix the issue. I have no idea.

chrisgbk commented 10 months ago

Can confirm, and not affected by --debug-no-optimize-matcher

Adding whitespace does fix, so seems an extra character is being pulled in for this specific case.

MineRobber9000 commented 10 months ago

Actually, this happens regardless of whether the glued token is 1 character or multiple:

#subruledef arg
{
        bar => 0`4
        baz => 1`4
}

#ruledef
{
        foo{x: arg} => 0`4 @ x
        test{x: arg} => asm { foo{x} }
}

testbar ; error: failed to resolve instruction / note: within `#anonymous_ruledef_1`, rule 1 / note: match attempted: `footestbar` / error: no match found for instruction

This doesn't make any sense to me; in the foo{x: arg} rule, customasm has no issues determining what is foo and what is the argument x, but when we go to the test{x: arg} rule, suddenly we can't figure out where one ends and the other begins?

MineRobber9000 commented 10 months ago

Okay, I added some debug prints in certain spots and I think I've narrowed the issue to the token-for-token substitution. Specifically, testbar (or jeq in the original testcase) is treated as a single Identifier token, which is substituted wholesale into the asm block for {mode}, even though the only part of that token that matters to the mode argument is bar (or eq).

Notably, the parser does correctly understand the glued parameter (as proven by foobar/beq working correctly), but that understanding does not pass to the token substitution in the asm block.

hlorenzi commented 10 months ago

This is part of a bigger issue with the current architecture: we need to have the token-walker be able to split and reinterpret tokens at parameter boundaries. I'm trying to work out a solution for this.