hlorenzi / customasm

💻 An assembler for custom, user-defined instruction sets! https://hlorenzi.github.io/customasm/web/
Apache License 2.0
720 stars 56 forks source link

Alignment feature -- Feature request #4

Closed milanvidakovic closed 6 years ago

milanvidakovic commented 6 years ago

Hi,

I am very pleased with your program. However, I need an advice for the following matter.

If I set the #align to 8 bits, and need to have all my labels assembled to addresses dividable by 2, or 4 (like some ARM or MIPS processors), how can I achieve this? For example: label1:

d8 1

label2:

d8 2

If the label1 starts on the 0x10 address, then the label2 would start at the 0x11, but I would like to have it on the address 0x12 (so it would be on the even address, not the odd address).

So far, I was only able to do it manually, by inserting #d8 0 tokens to align the label to the proper address. Is there a way to achieve this automatically using the latest release?

If I align everything to 16 bits, then all my addresses are actually half the real addresses, because then everything is counted in words, not bytes ( I cannot use #d8 actually). I need to align everything to 8 bits.

Best regards, Milan

hlorenzi commented 6 years ago

I'm afraid it's not possible with the current release. Do you believe the introduction of a new directive for alignment would be sufficient here? For example:

label1:
    #d8 1

#align 2
label2:
    #d8 2

(This reuses the align directive name, but I think it's the older one that was a bad choice)

This directive would add padding until the address was a multiple of the given size.

milanvidakovic commented 6 years ago

Hi,

Thanks for the quick response. I would suggest a new keyword: #addr_align which could be used to define the address alignment.

I even tried to add that keyword in your code, but I still don't quite understand everything in the code and I am not experienced with the Rust, so I was not successful.

I have tried something like this:

#align 8 #addr_align 2

I managed to correct label addresses during parsing (mov r0, label1), but I failed at dumping label content bits (the content of the #d8) into the output file (there I had to insert additional zero bits to align the label content to the address dividable by 2).

Best regards, Milan

hlorenzi commented 6 years ago

Just to be sure I fully understand your suggestion, my idea is that you should use the new align directive explicitly every time you want an aligned address:

label1:
    #d8 0x1

#align 2 ; <-- here
label2:
    #d8 0x2

#align 2 ; <-- here
label3:
    #d8 0x3

#align 2 ; <-- here
label4:
    #d8 0x4

If you don't specify it between any of the labels above, it would result in no padding being added (i.e. as it is now -- still only using 8 bits between two labels). Would this be too cumbersome or undesirable?

milanvidakovic commented 6 years ago

Hello,

Thank you for the suggestion. It is OK for me to do this way. Just to make sure: can I put the #align 2 above any label, including code? For example:

jmp test

align 2

test: mov r0, 5 I expect that the code at the test label be also aligned to 2 bytes.

Best regards, Milan

hlorenzi commented 6 years ago

Yes, that would be possible!

But regarding your other idea, #addr_align: does that go inside the #cpudef? Would it automatically try to align labels every time you defined one? I'm not familiar with how ARM assemblers usually handle this.

What problems have you encountered when trying to use a 16-bit #cpudef? If you need the 8-bit address equivalent, you could just multiply it by 2:

ld r5, {addr} -> 0x55 @ (addr * 2)[7:0]

And I believe you can, in fact, use the #d8 directive -- you'd just need to add padding elements before the next label definition:

data:
    #d8 0x12, 0x00

label:
    ld r5, data
milanvidakovic commented 6 years ago

Hi, yes, that was my idea, because all labels and all jumps and calls should end up on an even address (currently, the CPU is 16-bit), or address aligned to 4 bytes (when my CPU "grows" to 32 bits).. That was the idea of #addr_align keyword in the #cpudef section.

The problem I encountered with the 16-bit align is that I need both 16-bit access and 8-bit access to the memory. For example, I have: ld r0, 100 and ld.b r0, 101

The first is 16-bit wide and loads the 16-bit value from the address 100 (word-aligned, should be always even). The second is 8-bit wide and loads just 8 bits from the address 101, and it can be both even and odd (ld.b r0, 100, or ld.b, 101). That is the problem.

The ld.b instruction should receive the address which could be both even and odd, while 16-bit version of that instruction (ld r0, 100) must always receive address which is even. My CPU fetches data from memory in 16-bit chunks always. If the address is odd, it throws exception (trap).

I hope that I have clarified what I need. Best regards, Milan

hlorenzi commented 6 years ago

I've added both the #align and the #labelalign directives! They're also covered in the docs. Let me know what you think.

milanvidakovic commented 6 years ago

Hi, Thank you, it works perfectly!

However, I now have problems with the #tokendef. If I have:

cpudef

{

bits 8

labelalign 2

tokendef reg

{ r0 = 0 r1 = 1 r2 = 2 r3 = 3 r4 = 4 r5 = 5 r6 = 6 r7 = 7 sp = 8 h = 9 } mov {dest: reg}, {src: reg} -> src[3:0] @ dest[3:0] @ 4'0x0 @ 4'0x1 mov {dest: reg}, {value} -> 4'0x0 @ dest[3:0] @ 4'0x1 @ 4'0x1 @ value[15:0] }

Then, this code reports an error: mov r0, r1 mov r0, text1 text1:

d8 1

error: no match for instruction found 1 | mov r0, r1 2 | mov r0, text1 | ^^^^^^^^^^^^^

I guess that for your parser, those two lines appear identical: mov {dest: reg}, {src: reg} -> src[3:0] @ dest[3:0] @ 4'0x0 @ 4'0x1 mov {dest: reg}, {value} -> 4'0x0 @ dest[3:0] @ 4'0x1 @ 4'0x1 @ value[15:0]

While, for my CPU, they are not: mov r0, r1 puts the content of the r1 into r0,

mov r0, text1 puts the address of the text1 label into r0

Best regards, Milan

milanvidakovic commented 6 years ago

Thanks for the latest release! It works perfectly! Best regards, Milan