z00m128 / sjasmplus

Command-line cross-compiler of assembly language for Z80 CPU.
http://z00m128.github.io/sjasmplus/
BSD 3-Clause "New" or "Revised" License
384 stars 54 forks source link

Feature suggestion: Pseudo-op to know the "size" of a label #16

Open sdsnatcher opened 5 years ago

sdsnatcher commented 5 years ago

There are many places where we need to know the how many bytes were used under a label. For example, when we LDIR routines, or need to loop over a string.

Normally, we have to add extra labels to mark where the end is, and subtract endlabel-LABEL to find the size. But this ends up in the creation of a plethora of nearly useless labels that only clutter the symfiles and the debuggers.

It would be much easier if we had a pseudo-op like SIZEOF(label) that would count how many bytes that label has until the next label at the same or higher level is found.

For example:

FOO:
    ld  a,3
    call    BAR
    jr  c,.skip
    ld  a,1
    ret

.skip:  ld  a,2
    ret

BAR:
    cp  5
    ret

INSTRAM:
    ld  hl,FOO
    ld  de,MYRAM
    ld  bc,SIZEOF(FOO)
    ret

In the example above, SIZEOF(FOO) would return 13. Allows an easy LDIR to another location without hassle.

It would be useful for strings too:

CHKCHARS:
    ld  a,(MYCHAR)
    ld  hl,.charlist1
    ld  bc,SIZEOF(.charlist1)
    cpir
    ld  a,1
    ret z
    ld  a,(MYCHAR)
    ld  hl,.charlist1
    ld  bc,SIZEOF(.charlist2)
    cpir
    ld  a,2
    ret z
    ld  a,(MYCHAR)
    ld  hl,.charlist1
    ld  bc,SIZEOF(.charlist3)
    cpir
    ld  a,3
    ret

.charlist1: db  "ABCD"
.charlist2:     db  "EFG"
.charlist3: db  "HIHKLMNO"

BAR:
    call    PRTCHAR
    or  a
    ret z
    ld  a,9
    ret
ped7g commented 5 years ago

But this ends up in the creation of a plethora of nearly useless labels that only clutter the symfiles and the debuggers.

vs

It would be much easier if we had a pseudo-op like SIZEOF(label) that would count how many bytes that label has until the next label at the same or higher level is found.

This is like contradicting itself, because in the first example SIZEOF(FOO) == (BAR-FOO), and you can't get rid of BAR label, because it's used by SIZEOF(..) if defined like this.

To unclutter symfile, it would make more sense to make SIZEOF not rely on the next label. How about adding marker :: into source code at point where you want the "end of label" happen. i.e.

FOO:
    ld  a,3
    call    BAR
    jr  c,.skip
    ld  a,1
    ret

.skip:  ld  a,2
    ret

BAR:
    cp  5
    ret
        ::

INSTRAM:
    ld  hl,FOO
    ld  de,MYRAM
    ld  bc,SIZEOF(FOO)
    ret

Then SIZEOF(FOO) would be equal to INSTRAM-FOO and SIZEOF(BAR) would be INSTRAM-BAR (i.e. the :: would work for all previous labels up to another :: marker).

In current version the :: will parse as the : instruction delimiter twice, creating effectively "empty instruction" (no error/warning or binary output change). And it can be added also to instruction on the same line, i.e. in the example above the marker :: can be added like ret :: if one prefers that.

For the strings this may then look like:

.charlist1: db  "AB", 27, 1   ; multi-line string with extra chars
                        db  "CD"    ::
.charlist2:     db  "EFG"  ::
.charlist3: db  "HIHKLMNO"  ::

Another option is to make SIZEOF work till EOL, which would make it useless in case of code copying, only single-line strings/defb blocks would be meaningful.

As (next_label - label) defined it doesn't appeal to me personally that much (that I would want to work on it). What's your feel/opinion about such modification?

ped7g commented 5 years ago

Feedback from Busy: If defined by ::, it will not work in "nested" way, while the original proposal thanks to the label depth does work in "nested way", i.e.

helloworld:
.hello db "hello"
.world db "world"

would have 10 == sizeof(helloworld) && 5 == sizeof(.hello) && 5 == sizeof(.world)

I personally don't mind non-nested version of sizeof, but this is surely interesting point.

ped7g commented 5 years ago

And one more question. If the definition is (from original post)

would count how many bytes that label has until the next label at the same or higher level is found

Does it mean "next" in source-way, or "next" in memory address? I.e. what should sizeof do with this:

        org     $8000
lab1:   db      1, 2, 3, 4, 5, 6, 7, 8, 9, 10
lab2:   ret
lab1X:  equ     $8004
        ld      a,SIZEOF(lab1) ; is this 4 or 10?

EDIT: it must be "source" way, i.e. ld a,10. The "address" way may happen by accident when somebody will do initHitPoints equ $8004 in different part of source, without realizing it is also affecting the sizeof(lab1), so this question is resolved.

maziac commented 5 years ago

Hi, one thought from my side.

In general I like the idea of a size operator. It can be quite handy for strings or LDIR and I also used the pattern: size = end_label-start_label a lot.

But I also use another pattern quite often to check for boundary overwrites:

lab1:
  db 1, 2, 3, 4, 5
  db 0 ; WPMEM

To admit, this is very specific to my own usage: When using it with the z80-debug (vscode extension), z80-debug uses some keywords in the comment part of the list file. It checks e.g. for "WPMEM". (Please look for "WPMEM" in https://github.com/maziac/z80-debug/blob/master/documentation/Usage.md)

For each found "WPMEM" automatically a watchpoint is added that stops execution whenever a read/write to that memory location happens.

This way I "waste" one byte of memory for the benefit to easily find any out-of-bounds access to label "lab1".

Long story short: For this:

lab1:
  db 1, 2, 3, 4, 5
  db 0 ; WPMEM
nextlabel:

and the original proposal, "size" would be one byte too big as it would contain also my "wasted" guardian byte.

So I would support the proposal with the double "::".

ped7g commented 5 years ago

The :: can be also extended with :.: to "end" counting on local-label level...

But overall this feels to me like getting too complicated, I'm really afraid in assembly these things are sort of too high-level, just adding extra syntax complexity for very small benefit while writing the code and never fitting all use-cases well.

Then again using SIZEOF(label) sounds less error-prone in case you will for example rearrange several strings, in classic way you must fix also all the length (nextlabel-previouslabel) calculations, while with SIZEOF(..) you can move the definitions up/down without worry. So I'm not strictly against, I have just difficult time to see which way of behaviour will work best, to prevent further confusion and unexpected problems. (in terms of the previous source/memory question the "obvious" answer is source-only, as you can get symbol with particular value as part of math expression, not realizing it does point between regular memory labels... but it took me whole day to realize the address-based approach is completely broken :) ... well, better late than never).

ped7g commented 5 years ago

But I think this is actually getting somewhere... as the :: and :.: can extend the original proposal (you can simply use only labels, if you don't like those extra operators, and in normal asm sources the chance somebody has :: by accident is basically zero - if somebody does, he should clean up his source code, or for stylistic ascii-art reasons use comments ... :)

I.e. for source:

L1:     db  "abc"
.locL1  db  "d"
L2:
.locL1: ds  10 :.:
        db  0       ; WPMEM
.locL2: ds  5 ::
        db  0       ; WPMEM

the results would be:

4 == SIZEOF(L1)
1 == SIZEOF(L1.locL1)
16 == SIZEOF(L2)
10 == SIZEOF(L2.locL1)
5 == SIZEOF(L2.locL2)

Seems a bit hairy to explain, but if somebody wants to code in Assembly, they probably have seen already worse... ???

edit: for completeness of the design, taking down a note: the module/endmodule will work as :: automatically. Modules are about encapsulation, so things like sizeof shouldn't leak across, that doesn't make sense (to me at least). edit2: although sizeof(module_name) sounds quite interesting, but that's even above the :: level.

edit3: also org/disp will probably work as :: (org highly likely, as it allows to go backwards, disp actually maybe not, because that can be used to prepare the code which will be later relocated by some ldir, where such sizeof may be useful ... needs some more research and use cases to confirm).

maziac commented 5 years ago

Makes sense to me. Would be a nice feature if it works as you explained. The example would very well fit my coding style.

ped7g commented 5 years ago

few more notes about possible implementation: source-based deduction should be probably applied also in case of macro expansion/includes, treating those as non-label instruction (ignoring any local or global labels defined by macros, or inside the included file).

But then if org was used inside included file, it should probably invalidate any "counting" label from upper file, i.e.:

Label:
    include "other_code_with_org.i.asm" ::
    ld bc,SIZEOF(Label)  ; <-- error, can't compute size of Label
Label2:
    include "trivial_code_with_labels.i.asm" ::
    ld bc,SIZEOF(Label2) ; OK, measuring total size of "trivial code", ignoring the labels inside it
    ; the labels inside the include have their own sizeof, ending at EOF at latest
Label3:
    ten_byte_macro_defining_global_label_after_five_bytes ::
    ld bc,SIZEOF(Label3) ; OK, bc = 10 (ignoring the global label defined inside macro)

con: this makes impossible to work with "counting" labels inside macro, like writing macro for "::", but I can't imagine very meaningful usage of that, keeping everything "source-based"/"visual" seems more natural to me at this moment

Q: what about conditional assembly?

Label:
    db "abc"  ; 3 bytes
    IF (false) :: ENDIF
    ; ^ that syntax should work, because "::" is also instruction delimiter,
    ; not just sizeof "counting" stopper, i.e. ENDIF will be found and assembled.
    db "d" ; 4th byte
    IF (false)
Label2:   ; this one will not assemble into regular labels
    ENDIF
    db "e" ; 5th byte
    IF (true) :: ENDIF
    ld bc,SIZEOF(Label) ; 3, 4 or 5?

From implementation point of view the result "5" is probably most logical and easier to obtain. From my human point of view this seems ok and logical too, although a bit tricky to read, but seems to me like natural complexity stemming from usage of conditional assembling.

sdsnatcher commented 5 years ago

IMHO, the :: extension is useful, but it should be optional. It will be used only when the programmer wants to artificially shorten the label size for some specific reason (like debugging).

The programmer can even add some IFDEFs to only have the :: added when needed, like

lab1:
    db 1, 2, 3, 4, 5
 IFDEF DEBUG
    ::
    db 0 ; WPMEM
 ENDIF
nextlabel:
maziac commented 5 years ago

Would be fine with me, as well.