hlorenzi / customasm

💻 An assembler for custom, user-defined instruction sets! https://hlorenzi.github.io/customasm/web/
Apache License 2.0
719 stars 56 forks source link

"Global" Variables and Mode switching Instructions #76

Open ProxyPlayerHD opened 3 years ago

ProxyPlayerHD commented 3 years ago

bit of a confusing title, sorry. but i'm not sure how to name this feature request.

basically this idea comes from me thinking about making an Assembler for the 65816. the thing with that CPU is that it has different "modes" it can run it, and the exact size of instructions is directly based on those modes.

Assemblers directly made for the 65816 have special directives that tell the Assembler in what mode the CPU current is. it would be amazing if such a feature would exist for CustomASM as well. (though not as directives but as special instructions)

as an example, the 65816 allows the A Register to be either 8 or 16 bits wide, which effects the size of Instructions that use the Immediate Addressing Mode, like LDA.

        ; Accumulator is in 8 bit mode
LDA #0x69   ; This assembles into a 2 byte instruction (0xA9 0x69)

        ; Accumulator is in 16 bit mode
LDA #0x69   ; This now assembles into a 3 byte instruction (0xA9 0x69 0x00)

and the idea i had to implement this is by allowing some form of "global" variables that can be declared and used in asserts to check for specific values/bits. they should probably be limited to the #ruledef (and any #subruledef inside) it was declared in, to avoid issues that real global variables normally cause...

as a concept/mockup:

#ruledef main
{
    global MODE_A`1 = 1 ; Initalize variable with the correct Mode

    M8          => MODE_A = 1
    M16         => MODE_A = 0

    LDA #{value}        => {assert(MODE_A == 1), 0xA9 @ value`8}
    LDA #{value}        => {assert(MODE_A == 0), 0xA9 @ value[0:7] @ value[15:8]}
}

so, how difficult do you think this would be to implement? if it's even worth it or possible. and obviously if you have a better or more refined idea to get functionality like this i'll be glad to hear about it.

hlorenzi commented 3 years ago

My original idea for this was to have some #enable and #disable directive that acted on ruledefs. You'd be able to organize your rules in different blocks that could be turned on and off on demand. Perhaps I could go as far as having something like this:

#modedef m8
{
  rulesCommon, rules8 ; list names of ruledefs to be enabled in this mode
}

#modedef m16
{
  rulesCommon, rules16
}

#mode m8 ; change modes, and enable corresponding ruledefs

Do you think this would fit your use-case?

ProxyPlayerHD commented 3 years ago

that would work fine for boolean modes/switches, but integer ones would be a bit more difficult.

check out this page on the AS65 Assembler's directives it has these kinds of "mode switches" that i'm asking for. .LONGA and .LONGI to be specific, but in CustomASM's case it would all be in the #mode directive.

what would be a nice extra is directives like .DPAGE which do require an integer as their "mode" basically on the 6502 you got the Zeropage, which is at address 0x0000 - 0x00FF and has it's own addressing mode that only uses 1 byte (LL 00) instead of 2 (LL HH). this is easy to optimize with the assembler as you just need to check if the address in an instruction (like LDA 0x0069) has the upper 8 bits = 0x00, if yes use the Zeropage addressing mode, if not use the absolute addressing mode.

on the 65816 you got the direct page, which is the same concept but instead of being limited to 0x0000-0x00FF it's upper 8 bits are controlled by a register, and it's impossible for the assembler to know what the value of that register is at any time, so that directive just tells the assembler what it should expect that register to contain, so it can optimize instructions again like with the Zeropage (but in case by comparing the upper 8 bits of the address with the value given by the directive)

(I'm sorry if this is too much to ask for) .

anyways is it still possible to have regular #ruledefs outside the #modedefs? that are always available no matter what mode is active? because only a few selected instructions are effected by the modes, not all of them.

parasyte commented 3 years ago

As requested, here is an alternate implementation idea.

There are some architectures which allow "switching" between specific processor modes. Some notable examples are ARM/Thumb and 65c816 (SNES) accumulator and index register widths. I also need this feature for an Atari Jaguar RISC assembler; the two RISC CPUs in this platform have very slightly different instruction sets. Switching between GPU and DSP mode would allow the assembler to output GPU-only or DSP-only instructions, while rejecting instructions for the opposite mode.

Here are some example assembly sources for each of the architectures mentioned above. Syntax is open to bikeshedding.

; SNES 65c816 assembler

#bits 8

; The #context directive is like a Rust enum
#context register_width {
    ax_8,
    a_16,
    x_16,
    ax_16,
}

#subruledef register_a {
    {value: u16} => {
        if register_width == a_16 || register_width == ax_16 {
            value
        } else {
            assert(value <= 0xff)
            value`8
        }
    }
}

#ruledef snes_65c816 {
    lda #{value: register_a} => 0xa9 @ value
}

; Accumulator and index registers default to 8-bit (first enum variant: ax_8)
; These instructions are 2-bytes each.
lda #0x00
; lda #0xffff ; ERROR: This should fail because the value is too large

; Set the context to enable 16-bit accumulator.
#setcontext register_width = a_16

; These instructions are 3-bytes each.
lda #0x00
lda #0xffff ; Now it works
; Atari Jaguar RISC (GPU and DSP) assembler

#bits 16

; The #context directive is like a Rust enum
#context cpu_mode {
    gpu,
    dsp,
}

#subruledef gpr {
    r{reg: u5} => reg
}

#ruledef jaguar_risc {
    ; These instructions exist on both architectures
    add {reg1: gpr}, {reg2: gpr} => 0b000000 @ reg1 @ reg2
    nop                          => 0b111001 @ 0[9:0]
    {}                           => {
        if cpu_mode == gpu {
            ; These instructions only exist on the GPU
            pack    {reg2: gpr}          => 0b111111 @ 0b00000 @ reg2
        } else {
            ; These instructions only exist on the DSP
            addqmod {n: u5}, {reg2: gpr} => 0b111111 @ n @ reg2
        }
    }
}

; Most instructions work on either CPU mode (default to GPU; first enum variant)
add r0, r1
nop

; Instruction matching is based on the CPU mode
pack r1 ; Works in GPU mode
; addqmod 7, r1 ; ERROR: Only works in DSP mode

; Set the context to enable DSP instructions
#setcontext cpu_mode = dsp

; pack r1 ; ERROR: Only works in GPU mode
addqmod 7, r1 ; Works in DSP mode

The ARM/Thumb example is far too complex to sketch here. It's probably easiest for asm files to just #include the ruleset for the CPU mode (ARM or Thumb) that will be accepted for the whole compilation unit.

parasyte commented 3 years ago

This thread already has some great ideas. So rather than implementing a whole if-then-else block, I think using assert is a perfect replacement (it works fine for boolean expressions like the ones in my post above).

The #modedef and #mode proposal should also cover both use cases that I defined. It doesn't cover the .DPAGE use case for optimizations, though. For that you really do need more than boolean expressions. Here is a sketch for that use case, using the #setcontext and #context proposal along with assert:

; SNES 65c816 assembler

#bits 8

; The #context directive is like a Rust enum
#context direct_page {
    bank(u8),
}

#ruledef snes_65c816 {
    lda {addr: u16} => {
        assert(direct_page == bank(addr >> 8))
        0xa5 @ addr`8
    }
    lda {addr: u16} => {
        assert(direct_page != bank(addr >> 8))
        0xad @ le(addr)
    }
}

; The `direct_page` context defaults to `bank(0)`.
lda $0x0001 ; Output: 0xa5 0x01
lda $0x8101 ; Output: 0xad 0x01 0x81

; The context can be changed as the remainder of the asm file is parsed.
#setcontext direct_page = bank(0x81)
lda $0x0001 ; Output: 0xad 0x01 0x00
lda $0x8101 ; Output: 0xa5 0x01

This optimization also works on 6502, so the NES example assembler could also benefit from it. Though the zero page is always zero, so it is not configurable.

ProxyPlayerHD commented 2 years ago

just to check, has there been any kind of development of the idea or if it even gets added into the assembler at all?