mikeakohn / naken_asm

Assembler for MSP430, dsPIC, ARM, MIPS, 65xx, 68000, 8051/8052, Atmel AVR8, and others.
http://www.mikekohn.net/micro/naken_asm.php
GNU General Public License v3.0
296 stars 50 forks source link

EQU does not like labels terminating with colon #45

Open fdx1601 opened 6 years ago

fdx1601 commented 6 years ago

Hi!

the following lines show some examples of the usage of the EQU pseudo instruction. It happens that the implementation in naken_asm does not support the terminating colon in the label name for this instruction. Also seems that the directive notation ".equ" is not supported at all. Maybe this is a documentation glitch and it isn't to be used this way at all.

Terminating the labels with a colon certainly is a matter of preference. I use it a lot in PIC programming, since labels terminated with colon are rendered bold in the source code editor, which helps a lot.

Since colons are supported for the other directives, I think it wouldn't hurt to make it work with EQU too.

                .65816

FOO1:           equ    %00001111       ; fails to compile
FOO2            equ    %11110000       ; compiles
FOO3:           .db    1, 2, 3         ; compiles
FOO4            .equ    %11110000      ; fails to compile (listed in directives)

Regards.

hosewiejacke commented 6 years ago

In my opinion a label is a named chunk of code or data. 'equ' is neither. I'd prefer not to allow a colon before 'equ' thus making it look like a label.

mikeakohn commented 6 years ago

is FOO1 actually valid syntax? all labels in naken_asm must have the terminating : but equ is supposed to be defining a token to be a specific thing. In this case labels can't be assigned values, the assembler assigns values for them based on their address.

For FOO4, yeah I think this is a documentation glitch. I wasn't going to support .equ. I could if that's what's really wanted, but because of the way the tokenizer deals with "." it could cause a little performance hit. I've been tempted to change that in the tokenizer but it would require quite a big change around naken_asm.

Do you want .equ to be supported?

fdx1601 commented 6 years ago

is FOO1 actually valid syntax?

I don't know if there is a standard, but MPASM is happy with it and I think they follow a pretty consistent definition on what labels are and how to use them:

MPASM User Guide (http://ww1.microchip.com/downloads/en/devicedoc/MPASM_&_MPLINK_33014h.pdf)

1.7.1.1 LABELS (page 33)

    A label is used to represent a line or group of code, or a constant value. It is 
    needed for branching instructions.

    Labels should start in column 1. They may be followed by a colon (:), space tab or the
    end of line. Labels must begin with an alpha character or an under bar (_) and may
    contain alphanumeric characters, the under bar and the question mark.

4.27    equ - DEFINE AN ASSEMBLER CONSTANT (page 81)

    4.27.1  Syntax

    label equ expression

    4.27.2  Description

    The value of expression is assigned to label

Regards.

fdx1601 commented 6 years ago

I wasn't going to support .equ

I figured out the following pattern ... where "EQU" is treated different comparing to "ORG" ...

; compiles 
                    .65816
FOO1                equ     $C000
START:              org     $1000
                    jmp     START

; compiles
                    .65816
FOO1                equ     $C000
START:              .org    $1000
                    jmp     START

; does not compile
                    .65816
FOO1                .equ    $C000
START:              .org    $1000
                    jmp     START

as far as consistency goes ... I think it would be nice allowing the following syntax:

                    .65816
FOO1[:]             [.]equ  $C000
START[:]            [.]org  $1000
                    jmp     START

Regards.

mikeakohn commented 6 years ago

I think I was mostly following nasm syntax plus anything else that seemed to make sense that would make naken_asm be able to assemble other assembler's code without changes. I'm not sure I'm a fan of that syntax though....

fdx1601 commented 6 years ago

While browsing through the source I noticed that there is actually an existing implementation for .equ which seems to be equivalent with .set and .def.

                    .equ    FOO1 = $1000
                    .def    FOO2 = $1000
                    .set    FOO3 = $1000

Well, every tool has it's own quirks, and there is nothing wrong with that. Your assembler is the most versatile I know of and certainly is of great help for a bunch of people like us who love to do some work in assembly.

As I could 'smell' the part of code where it does do the colon thing, I might be tempted to hack around that place a bit to get it working ... if I find the time.

Regards.

mikeakohn commented 6 years ago

I saw that .def also while I was sifting through that area of the code and was kind of horrified by it. I'm not sure why I added it, maybe to make someone's include file work.

The .set thing works a bit differently than .equ ... I added that for someone so they could add things to the symbol table and modify them later if they needed to.

I'm not sure I like the MPASM syntax still... and I prefer to not possibly put some performance impact in there for it. Is there a reason you want that?

fdx1601 commented 6 years ago

I'm not sure I like the MPASM syntax still

Your concerns are about which part of the syntax? The optional colon for labels?

While browsing several assembler sources (other than MPASM) I noticed, that a lot of them don't use colons at all. I would tend to say the majority doesn't require terminating colons for label definitions.

Thus making them an optional element would make totally sense in order to support third-party sources.

Concerning the risk of a possible performance impact, I think that one or two more IFs won't make such a big difference. My own sources are a couple of thousand lines (at max) and compile almost instantly. The upload to the MPU on the other hand takes several tens of seconds what is the real bottleneck.

To sum it up: I think it would be worth the effort.

Also, couldn't this change help simplify code parsing and speed it up at the end?

Regards.

fdx1601 commented 5 years ago

One more try ;-)

Doesn't this code look nice?

CHROUT:     EQU $FFD2

            LDA #65
            JSR CHROUT
            JMP DONE
            NOP
DONE:       RTS

In this case it could be considered a label to an external address.

mikeakohn commented 5 years ago

I think what @hosewiejacke said makes the most sense and is how naken_asm behaves. When a label is inserted (a word with a colon at the end) the assembler will insert that word into the symbol table mapping it to an address. The equ directive has different syntax: equ .

Unfortuantely naken_asm is treating them as macros, I think they probably should be at least run though the "eval()" code. But I'm scared to change that and possibly break code.

fdx1601 commented 5 years ago

Lets see ...

I considered @hosewiejacke's post and understand his point too and for this reason I had choosen the example above.

Let's look at it in more detail:

In my opinion a label is a named chunk of code or data. 'equ' is neither. I'd prefer not to allow a colon before 'equ' thus making it look like a label.

I think the label is not a chunk but an address in the first place. Being an address it could point to a location in code, a chunk of data or any other address value which is assigned to it.

From this perspective, both examples below are virtually the same:

            .org    $ffd2
FOO:        

and

FOO2:       equ     $ffd2

Looking at the cases below, we see that they are all totally consistent in the way of using them:

FOO1:       equ     32              ; FOO1 = $0020
FOO2:       equ     $ffd2           ; FOO2 = $ffd2

            .org    $0000

FOO3:       NOP                     ; FOO3 = $0000
            RTS

FOO4:       .asciiz "some text"     ; FOO4 = $0002

            LDA     #FOO1           ; converting address to 8-bit immediate

            JSR     FOO2            ; jumping to address represented by FOO2

            LDX     #<FOO2          ; converting address to 8-bit immediate extracting LO byte
            LDY     #>FOO2          ; converting address to 8-bit immediate extracting HI byte

Examples given to MPASM:

FOO1:       EQU "FOO"       ; not allowed: Illegal argument (expected single character)
FOO2:       EQU H'FFD2'     ; allowed can be used from now on like any other label

BTW I totally understand your concern to avoid breaking the code, but this is another aspect, and has nothing to do with the valid usage of EQU.

Looking at the code I thought it could be possible to transfer some of the logic of EQU parsing into the part where labels are handled. Just to give a rough idea:

    // in file common/assembler.c:781

    if (token_type == TOKEN_LABEL)
    {
        int param_count_temp;
        if (macros_lookup(&asm_context->macros, token, &param_count_temp) != NULL)
        {
            print_already_defined(asm_context, token);
            return -1;
        }

        if (symbols_append(&asm_context->symbols, token, asm_context->address / asm_context->bytes_per_address) == -1)
        {
            return -1;
        }

        // EQU handling from: if (token_type == TOKEN_STRING)

        int start_address = asm_context->address;
        char token2[TOKENLEN];
        int token_type2;

        token_type2 = tokens_get(asm_context, token2, TOKENLEN);

        if (strcasecmp(token2, "equ") == 0)
        {
            ...
        }
    }        

I admit still having some problems to understand this part of the code and how symbols and macros are organized in order to give it a try. It doesn't seem to be a five minute thing ;-)