hlorenzi / customasm

💻 An assembler for custom, user-defined instruction sets! https://hlorenzi.github.io/customasm/web/
Apache License 2.0
719 stars 56 forks source link

Feature request: macro assembler #52

Closed tekknolagi closed 1 year ago

tekknolagi commented 4 years ago

First: awesome project! This is so helpful for writing terse assemblers.

@tchebb and I are writing a ruleset for the UM (IFCP 2006) and we would like to be able to define macros, for example:

#ruledef {
  loadi {reg}, {val} => 13`4 @ reg`3 @ val
  zero {reg} => loadi reg, 0
}

But this does not seem to be supported.

moonheart08 commented 4 years ago

for now, I recommend doing what GAS does: Run the C preprocessor over the file, assuming that works I've done that for an entirely custom assembler, where I could guarantee no syntax conflicts (it was for microcode), but i don't know if it'll agree with customasm.

tekknolagi commented 4 years ago

The only problem with that is that I don't think it will like customasm's #ruledef, etc, since they are not valid C preprocessor directives.

pol-rivero commented 4 years ago

I agree, macros (and other directives like #define and #ifdef) are the only thing this assembler needs to be perfect

hlorenzi commented 4 years ago

I think maybe moonheart08 is suggesting doing something like this?

#define LOADI_MACRO(r, v) 13`4 @ r`3 @ v

#ruledef
{
  loadi {reg}, {val} => LOADI_MACRO(reg, val)
  zero {reg} => LOADI_MACRO(reg, 0)
}

...and running the C preprocessor on the file (but I haven't tested this code).

I think the idiomatic way of doing this right now would be to just copy the expression and change what's needed by hand. Do you have a lot of code that would benefit from macros like those?

I could see some syntax like the following for the feature:

#ruledef
{
  loadi {reg}, {val} => 13`4 @ reg`3 @ val
  zero {reg} => resolve(loadi reg, 0)
}

...but I haven't really thought about the problems that could arise here. Currently instruction invocations are parsed and treated very differently to general expressions.

tekknolagi commented 4 years ago

So what I meant by "The only problem with that is that I don't think it will like customasm's #ruledef, etc, since they are not valid C preprocessor directives." is that this error happens:

/tmp/foo:3:2: error: invalid preprocessing directive #ruledef
 #ruledef

since #ruledef starts with a #.

I think a resolve() would look pretty cool. I took a brief stab at adding it as a function, but quickly found out that that would not work in the existing framework.

Perhaps it would be worth adding some kind of preprocessor as part of customasm, perhaps not.

tekknolagi commented 4 years ago

Do you have a lot of code that would benefit from macros like those?

Not exactly. A friend and I are working on a hobby project. Part of the request is from a want for an easy way to do two assembly syntaxes -- think AT&T / Intel syntaxes. Another part is a general want for a way to compose instructions into "higher-order" instructions by macro substitution. Kind of a low-tech compiler.

rj45 commented 4 years ago

I run my code through the C preprocessor, cpp. After some experimentation this is the set of command line options I came up with:

cpp -x assembler-with-cpp -nostdinc -CC -undef -P test.s > test.asm

It seems to ignore #ruledef etc. I have these defines for each instruction format:

  #define ri(op, r, imm) \
    r`4 @ op`4 @ \
    imm`8

  #define i8(op, imm) \
    op`4 @ 0b0000 @ \
    imm`8

  #define rr3(op, rd, rs) \
    rd`4 @ 0b0010 @ \
    op`4 @ rs`4

  #define rr4(op, rd, rs) \
    rd`4 @ 0b0001 @ \
    op`4 @ rs`4

  #define r(op, rd) \
    rd`4 @ 0b0001 @ \
    0b0000 @ op`4

Not ideal though. When I saw #subruledef I got excited thinking I could chain rules not just in the parameter list, but also in the body, but diving in the code, I couldn't see a way to do that. That would be much better than cpp's somewhat clunky syntax though.

skicattx commented 3 years ago

+1 on some kind of #define-like preprocessor directive.

aleferri commented 3 years ago

If you want to know how i did something similar in casmeleon:

.inline MAKE_SIB    
.with ( base : Register, index : Register, scale : Ints ) -> {  
    .return (scale << 6) + ( index << 3 ) + base;     // scale[7:6], index[5:3], base[2:0]
}  
.inline SEGMENT_PREFIX
.with ( r : Segments ) -> {
    .return r;
}  
.inline MAKE_RM
.with ( r : Registers, d : Ints ) -> {
    .return ( 3 << 6 ) + ( r << 3 ) + d;
}
.opcode move {{ dest, mod ptr [ segm:base + index * scaled ] }}  
.with ( dest : Register, mod : x86Modifiers, ptr : PtrKeyword, segm : Segments, base : Register, index : Register, scaled : Ints ) -> { 
    .if scaled != 1 && scaled != 2 && scaled != 4 && scaled != 8 {  
        .error scaled, "Only 1 | 2 | 4 | 8 allowed as scale";   
    }  
    .out [ .expr SEGMENT_PREFIX(segm), 0x8B, .expr MAKE_RM( dest, + 0b100 ), .expr MAKE_SIB(base, index, scaled) ];  
} 
hlorenzi commented 3 years ago

I've finally figured a sensible way for "calling" other instructions, so in v0.11.6 I'm introducing asm blocks! It's not exactly a full-fledged "macro assembler" as per the issue title, but your original problem can now be solved like this:

#ruledef
{
  loadi {reg}, {val} => 13`4 @ reg`3 @ val
  zero {reg} => asm { loadi reg, 0x0 }
}

Note the asm block is just like any other expression, so you can do things like concatenation:

#ruledef
{
  loadi {reg}, {val} => 13`4 @ reg`3 @ val
  zero_twice {reg} => asm { loadi reg, 0x0 } @ asm { loadi reg, 0x0 }
}

You can also specify multiple instructions inside a single block, one per line:

#ruledef
{
  loadi {reg}, {val} => 13`4 @ reg`3 @ val
  zero_thrice {reg} => asm
  {
    loadi reg, 0x0
    loadi reg, 0x0
    loadi reg, 0x0
  }
}

The asm block will resolve instructions as if you were invoking them as regular source code, so for example, it will take all #ruledefs into consideration, and it can't call #subruledefs directly.

I think the asm block should help with a lot of use cases, and you can kind of abuse it into a (weak?) macro feature.

tekknolagi commented 3 years ago

This is very cool! Thank you!

pol-rivero commented 3 years ago

It's nice to finally have macro support, good job! However, I don't know whether I'm using the asm block wrong or there's a bug with subruledefs. Your example assembles just fine, but this slightly modified version doesn't:

#subruledef REG {
    r0 => 0
}
#ruledef {
  loadi {reg: REG}, {val} => 13`4 @ reg`3 @ val
  zero {reg: REG} => asm { loadi reg, 0x0 }
}
zero r0

Output:

error: failed to resolve instruction
 --> temp.asm:8:1:
 6 |   zero {reg: REG} => asm { loadi reg, 0x0 } 
 7 | }
 8 | zero r0
   | ^^^^^^^  
 9 | 
 === error: no match for instruction found
 --> temp.asm:6:28:
 4 | #ruledef { 
 5 |   loadi {reg: REG}, {val} => 13`4 @ reg`3 @ val
 6 |   zero {reg: REG} => asm { loadi reg, 0x0 }
   |                            ^^^^^^^^^^^^^^    
 7 | } 
 8 | zero r0
rj45 commented 3 years ago

Edit: actually the fix is this:

#subruledef REG {
    r0 => 0
}
#ruledef {
  loadi {reg}, {val:s9} => 13`4 @ reg`3 @ val
  zero {reg:REG} => asm { loadi reg, 0x0 }
}
zero r0

Seems like there a potentially a bug with the subruledef in an asm block.

Also thanks for this! I will make good use of it :-)

pol-rivero commented 3 years ago

Yes, I also realised that removing the :REG fixes the issue, but in that case the original instruction (in this case loadi) would no longer work.

hlorenzi commented 3 years ago

Hmm, I knew I was gonna miss something in my quick hacking-away session! I've gotta think on how I'll go about passing syntax tokens into the inner asm block, as opposed to passing evaluated arguments. This might actually turn the feature into a non-hygienic macro system, so I'll need everyone's input when it's done. Everyone's feedback is always invaluable to me!

hlorenzi commented 3 years ago

Alright... I managed to do it. In v0.11.7, you can now have token substitutions in asm blocks! I'm not totally satisfied with the non-hygienics or the juggling that's going on in the code, but it should work! The syntax is:

#subruledef reg
{
    r0 => 0xaa
}

#ruledef
{
  loadi {r: reg}, {val} => r`8 @ val
  zero {r: reg} => asm { loadi {r}, 0x0 }
}

zero r0

The only difference here is that the argument is enclosed in braces { } inside the asm block, to trigger token substitution. It'll be replaced with the r0 token used in the invocation site.

pol-rivero commented 3 years ago

Thank you for your hard work! It works perfectly now! I have a couple of suggestions:

  1. You should add some mechanism to avoid cycles in macro calling. The following code will cause a stack overflow on v0.11.7:
    #ruledef {
    macro => asm { macro }
    }
    macro
  2. This is more of a syntax/aesthetic opinion, I don't know what will the others think about it.

For me it makes sense to use { } on the parameters that get passed to the instruction in the asm block, but currently they're only used if the parameter is part of a subruledef (not for immediates). I believe it makes more sense if all parameters use { }, both subruledefs and constants. So, for example, here's some code that currently works fine:

#subruledef REG {
    r0 => 0
}
#ruledef {
  loadi {reg: REG}, {val: u8} => 13`4 @ reg`3 @ val
  macro1 {reg: REG} => asm { loadi {reg}, 0x0 }
  macro2 {val: u8}  => asm { loadi r0, val }
}
macro1 r0
macro2 123

But I think it makes more sense this way:

#subruledef REG {
    r0 => 0
}
#ruledef {
  loadi {reg: REG}, {val: u8} => 13`4 @ reg`3 @ val
  macro1 {reg: REG} => asm { loadi {reg}, 0x0 }
  macro2 {val: u8}  => asm { loadi r0, {val} }       ; Notice the { } around val
}
macro1 r0
macro2 123

What do you think?

moonheart08 commented 3 years ago

An easy way to forbid cycles is to have a maximum substitution depth, which also handles a bunch of other related issues, so it's what I'd probably go with.

aleferri commented 3 years ago

Why can't you forbid the user to expand not yet defined macros or recursive macros? This would guarantee termination.

moonheart08 commented 3 years ago

Why can't you forbid the user to expand not yet defined macros or recursive macros? This would guarantee termination.

Recursive behavior and deep substitution has it's genuine uses, so forbidding it would be a bit heavy handed to start.

aleferri commented 3 years ago

Why can't you forbid the user to expand not yet defined macros or recursive macros? This would guarantee termination.

Recursive behavior and deep substitution has it's genuine uses, so forbidding it would be a bit heavy handed to start.

While true, when i had to make a similar decision i took the easy route, not allowing recursion of any kind, as i would end up with another full blown turing complete language. If deep recursion is needed, maybe it would be better to use the assembler as a library and setup it from a real and well mantained programming language. Of course hlorenzi is free to tackle the problem in another way on his own macro language.

hlorenzi commented 3 years ago

@p-rivero If we use { } for immediate arguments, would they also be token-substituted, or still passed by value? Passing by value would be more hygienic, which is what the non-enclosed version does. Do you just wish the syntax would be more consistent?

pol-rivero commented 3 years ago

@hlorenzi Yes, it's just syntax suggestion. I still haven't had time to test it much, but it seems to me that it works just fine being passed by value. (Also, this is just my personal opinion. If I'm the only one that thinks this then there's no reason to change the syntax)

rj45 commented 3 years ago

You know what, I don't like what I said, I hope you don't mind if I take it back.

For my use cases, it doesn't matter to me if the macros are hygienic or not because I am not maintaining thousands of lines of them. But it seems like that's important and I can totally respect that.

I like the implementation, it is better than the C pre-processor I have been using and I will certainly switch. A simple and consistent syntax are worthy goals, but it fits what I need: I can reduce duplication, I can have a set of macros for each instruction format, I can implement pseudoinstructions in terms of other instructions, and I can verify the assertions just a few times rather than all over the place.

It will clean up my code quite a lot and I am pretty excited about that, so thank you! Thank you for spending your free time making such an awesome and useful tool!