hlorenzi / customasm

💻 An assembler for custom, user-defined instruction sets! https://hlorenzi.github.io/customasm/web/
Apache License 2.0
720 stars 56 forks source link

Error with misaligned string ends. #134

Closed TChapman500 closed 1 year ago

TChapman500 commented 2 years ago

If we have a 16-bit address boundary, but say an ASCII or ANSI null-terminated string, then strings with an even number of characters (not counting the null-terminator) will not work.

This 13-character string works:

#bits 16
hello_world:
    #d "Hello, World!\0"
; Something after this

But this 12-character string does not work:

#bits 16
hello_world:
    #d "Hello, World\0"
; Something after this

The error I get is position is not aligned to an address boundary (8 bits short). Perhaps in the case that the string end will cause data misalignment for whatever comes after it (including a string length constant), the assembler should simply zero-fill the remaining space so that whatever comes after will be properly aligned. And so that a string length constant can be obtained.

hlorenzi commented 2 years ago

That's a good idea! I hadn't even realized that would be a problem.

Any possibility that the user would want this (broken) behavior instead?

TChapman500 commented 2 years ago

I don't believe that there would be any reason for anyone to want this type of behavior.

parasyte commented 2 years ago

For ISAs with a minimum addressable data size that is not 8-bits, this suggestion sounds reasonable. I have found that #bits is usually not the right tool, however.

I am working with MIPS at the moment, where all instructions are 32 bits wide. But the minimum addressable data size is 8 bits. For this architecture, #bits 32 is incorrect. That said, I would still like a way, in general, to pad some data types (like strings) up to a specific alignment. Currently, I have to do this manually:

hello:
#align 32
#d "Hello, world!\0"

version:
#align 32
#d "Version: 1.0.0\0"

It is also still useful to not align after strings, in cases where you want to tightly pack the data.

It's easy to forget to include an alignment directive following data types with sizes that are not divisible by the minimum instruction width. For that reason, I think adding a new directive to specify instruction width (not data width) will be useful, but I don't think even that solves this particular case (which involves data).

Maybe a new directive to both align and output data is needed? Or an extension to the #d directive which adds a padding parameter?

hlorenzi commented 2 years ago

I forgot to mention there's an undocumented #labelalign directive, which might come in handy here? When you specify it, the assembler will automatically align every global label you define. So for example:

#bits 8
#labelalign 32

#ruledef
{
    hlt => 0x33
}

hello:
    #d "Hello"

code:
    hlt

This would be interpreted as if there was an #align 32 directive right before each of hello: and code:, so code will actually start at address 0x8. (Non-global labels are not affected, though.)

Does this help the issue everyone is having here? I should add it to the documentation. 😅

parasyte commented 2 years ago

I believe that addresses the specific issue I brought up in this thread. It looks like the label alignment can be temporarily made smaller for packing structs, also.