z00m128 / sjasmplus

Command-line cross-compiler of assembly language for Z80 CPU.
http://z00m128.github.io/sjasmplus/
BSD 3-Clause "New" or "Revised" License
374 stars 52 forks source link

Temporary numeric labels bug #42

Closed Liniya closed 5 years ago

Liniya commented 5 years ago

Hello. There is a bug which breaks the temporary numeric jump labels in some way. See the following example. In this situation, it says that the 'backwards 1' label is not found. It seems to be related to the number of 'asdf' lines because if one of them is removed, it will work as expected.

    DEVICE ZXSPECTRUM128

    ORG #8000

    MODULE M1

LABEL equ 0

    ENDMODULE

    MODULE M2

     IFNUSED M1.LABEL
.tmp = M1.LABEL
;asdfasdfasdfasdf
;asdfasdfasdfasdf
     ENDIF

    ENDMODULE

1
    jr 2f
    jr 1b
2
ped7g commented 5 years ago

Does work like this also in v1.10.3 and v1.10.4, so it's not new regression (JFYI about progress). And I can reproduce it (=confirmed). Thank you for reporting it. :)

ped7g commented 5 years ago

OP: that source is fundamentally broken, and as such it also break assembler.

I'm working on the fix(es), but part of the fix will be refusing to assemble source like this (defining particular label inside IFNUSED block, which did trigger the IFNUSED condition itself), and it makes me curious how you did come with such piece of code, do you have some particular use-case on mind?

I can probably help you to rewrite it in different style.

The local labels are still broken in multiple ways in the current implementation, so one can easily break them in similar way without IFNUSED, but your particular source will be illegal even after fixing local labels.

ped7g commented 5 years ago

thinking about it.. there are multiple more convoluted ways how to break the assembler even with fix I have on mind (forbidding particular label definition)... But that doesn't concern me that much, if one aims to break the assembler, there are many opportunities how to write artificial source which will break it. The focus is on the regular real-world use cases, to make work at least those :)

Liniya commented 5 years ago

Thanks for the quick reply. For now, let me clarify one thing. Are you saying that setting a label to the value of another, existing label inside an if-statement is 'fundamentally broken'?

ped7g commented 5 years ago

No, I was not clear enough and actually even used wrong wording, let's try again and hopefully this time I will manage to be more accurate and easier to understand.

Your source shows two problems of v1.11:

One problem is implementation of local labels, which was bugged, and I'm going to commit fix soon, which will actually make even your source to compile and produce probably what you may accept as "correct" binary - just by making the heuristic resolving local labels more robust and ignorant (to structure changes). Although you may be surprised by final result.

Other problem is inherent and principal, your source is constructing logical paradox. The line .tmp = M1.LABEL will make label M1.LABEL "used", but that happens inside "M1.LABEL is unused" block. That makes this piece of source to evaluate differently in second and third pass of assembling (because the condition is no more true), so if you would put some actual code/data inside that block, it will be not produced into output (and if that label is used outside of your IF, it will be produced neither, so basically you can delete that block, as it will never produce any machine code).

Now in principle (of Assembly language) the assembler is designed to assemble "stable" code. There are 3 passes to catch up on minor differences and forward-reference of labels, but the code should be written in a way to self-stabilize within 3 passes, actually if the amount of machine code produced differs between pass 2 and pass 3, you may already get warnings about labels having different values. I.e. pass2 should produce sort of "final" code structure (same instructions as in pass3), while pass 3 is then just emitting the machine code itself, and using the collected values of labels from pass 2.

The sjasmplus has powerful macro language capable to easily write code which is lot more dynamic (different between each pass), but if you write such code, you are in risk of getting unexpected machine code in pass3.

It's not only about setting some label, or about using IF blocks, there are actually many other ways how I can break v1.11 in the similar way by other directives.

There's the lingering idea between contributors to this project to make sjasmplus "multi-pass" kind of assembler, which would do more passes until the code base does stabilize, before emitting machine code, which will allow for more dynamic constructs and more predictable results in such case.

But even then, with multi-pass architecture, it will be possible to write code which will not stabilize enough (and will fail to assemble when the amount of passes hits some limit, or the assembler may incorrectly think the code is stable, and produce unexpected/incorrect machine code).

... so if you got to your example by trying to write real code, your current architecture is not truly suitable for 3-pass assembler, and you should probably rethink the way how you structure the code, to make it more rigid (especially between pass2 and pass3, the whole point of having extra third pass is to allow for some changes between pass1 and pass2).

Liniya commented 5 years ago

That was quick! Thanks again for your work. It seems like I've stumbled into something rather big by accident.

As well, I thought it might in fact be useful to do a tl;dr on where the IFNUSED example comes from:

However, I've found that IFUSED/IFNUSED did not quite work as stated per se. In the next example, the DISPLAY directive never gets executed despite the fact that the label is clearly used in the code several times. (I apologize if the code formatting doesn't work).

    DEVICE ZXSPECTRUM128

    ORG #8000

    MODULE M1

LABEL equ 0

    ld a,LABEL

    ENDMODULE

    ld b,M1.LABEL

    MODULE M2

    IFUSED M1.LABEL
     DISPLAY "Congraturations, you sucsess!"
    ENDIF

    ENDMODULE

IFUSED is described in the documentation as: "The condition is true if there is an label used somewhere in the code." If we go by this, then M1.LABEL should count as 'used' inside M2 as well because it was used 'somewhere in the code' above, but again, this isn't what happens. - Perhaps this is another bug that needs fixing? So the OP is a simplified illustration of the method I've used to get around this issue. The purpose of that IFNUSED is to make a particular set of labels count as 'used' regardless of the module we're in (or the lack thereof). The temp numeric labels are entirely unrelated to this, and came about by accident over the course of testing various things.

ped7g commented 5 years ago

That looks like another bug (unless there is some extra special rule that inside new module the old "used" outside doesn't count, but I guess it's just simply bug, will take a look later, ).

About your goal... I'm a bit confused, why not DEFINE? That's kinda what they are for, you can even inject them from command line, for example let's say you have this source (in main.asm):

    IFDEF USE_M1 : DISPLAY "M1 will be used" : ELSE : DISPLAY "M1 will be NOT used" : ENDIF
    ORG #8000

    IFDEF USE_M1
        MODULE M1           ; this may be even INCLUDE m1.i.asm
LABEL       DB  'm1'
            ld  hl,LABEL
            ld  de,M2.LABEL
        ENDMODULE
    ENDIF

    MODULE M2
        IFDEF USE_M1
LABEL       DB 'm2-for-m1'         ; only for M1
            DISPLAY "M1.LABEL=",M1.LABEL
        ENDIF
LABEL2  DB 'm2'
    ENDMODULE

Now if you will assemble it with sjasmplus --nologo --msg=war --lst --lstlab main.asm -DUSE_M1 You will see:

> M1 will be used
> M1.LABEL=0x8000

And the labels table in listing file will be:

Value    Label
------ - -----------------------------------------------------------
0x8000   M1.LABEL
0x8008   M2.LABEL
0x8011 X M2.LABEL2

Contrary if you will do sjasmplus --nologo --msg=war --lst --lstlab main.asm (omitting the define "USE_M1"), you will get:

> M1 will be NOT used

and labels table in listing:

Value    Label
------ - -----------------------------------------------------------
0x8000 X M2.LABEL2

Or you can define that value in code by:

    DEFINE USE_M1

Just be aware, that if you will use USE_M1 directly like call USE_M1, it will get replace by the defined value (-DUSE_M1 => call => error Operand expected, -DUSE_M1=xx => call xx) (-DUSE_M1=USE_M1 will create infinite substitution loop, causing asm to abort after 20 substitutions).

So use rather verbose and unique names for these compile-time configurations for IFDEF/IFNDEF, avoiding some accidental clash with regular label/macro/instruction.

Liniya commented 5 years ago

If this behavior of IFUSED in regards to modules is fixed, that would indeed be great as it's another one of those things that don't make logical sense.

I'm not using DEFINE in this instance mainly because the name of the label that is being tested is partly constructed from input arguments in a macro, using the 'underscore label name concatenation'. Like this:

    MACRO test prefix, name
     IFUSED prefix_name
      ;do things if this label exists
     ELSE
      ;do other things
     ENDIF
    ENDM

Here, 'prefix' comes from a number of predefined strings, while 'name' is the user-supplied portion. I believe this feature only works with regular labels and not DEFINE's (it also doesn't work with local labels starting with a dot IIRC).

ped7g commented 5 years ago

ok, then doing IFUSED makes sense, if it's product of macro argument. Then this invisibility of used flag inside module is almost fatal.

You can probably do one workaround... put those IFUSED in the module where the label may be used, and do DEFINE inside those IFUSED ... and then manually "unpack" all possibilities in other module IFDEF xx_yy ... but overall maybe just wait if I will be able to provide fix, or maybe think if you could simplify the whole construct a bit, seems to me like quite a beast to keep in head and debug.

Also as long as you will use that IFNUSED to fake-use the particular label and nothing else (and no machine code, just like in OP), it's actually probably safe workaround how to enforce the usage of label in other module for this moment (local labels are now hardened enough to survive that). Just keep in mind the various IF blocks should be pretty much set in stone at beginning of pass2, to avoid further nasty surprises... if you hit some inconsistency even then, that's highly likely bug, but if you will flip conditions during pass2 into pass3 generating different code, you are asking for trouble. :)

ped7g commented 5 years ago

So IFUSED M1.LABEL inside M2 was looking for label M2.M1.LABEL, which is IMO correct. But there was no support for IFUSED @M1.LABEL, that was syntax error.

Fixed on master branch, the IFUSED/IFNUSED now do understand "global" labels starting with @.

See the test commit above for example.

Liniya commented 5 years ago

Thought about it. First off, this method with the '@' is probably enough for my current purpose. However, I see a kind of discrepancy between IFUSED and (at least) the regular if-statement, because the latter doesn't seem to have a problem 'locating' a label in this situation. For instance if the same example is modified like below, there are no errors about missing labels and such like.

    MODULE M1

LABEL equ 0

    ld a,LABEL

    ENDMODULE

    ld b,M1.LABEL

    MODULE M2

    IF M1.LABEL = 1
     DISPLAY "M1.LABEL = 1"
    ENDIF

    IF M1.LABEL = 0
     DISPLAY "M1.LABEL = 0"
    ENDIF

    ENDMODULE

But perhaps I'm missing something there and that's the way it's supposed to be, I can't quite tell.

By the way, is using '@' like that meant specifically as a workaround for IFUSED, or does it have a wider application to similar effect in the assembler (aside from label definitions, which are documented).

ped7g commented 5 years ago

You are not really missing anything... thinking about it, M1.LABEL should have worked - the IFUSED should try first M2.M1.LABEL and if such label does not exist, it should have still tried M1.LABEL, at least that's the way how current documentation can be understood (although I don't like it, that's kinda too much ambiguous for my taste).

The problem is, that there are multiple pieces of code dealing with label parsing and validation (I think there are about 4-6 label/id-name parsers and 2-3 validators) and they have subtle differences plus subtle bugs.

So it's kinda like a hydra, you fix one thing and the next one just pops out.

That said, I don't even like the docs description, the Example 3.1 in some details is IMO wrong, so this is not only about fixing the implementation for me. I'm thinking about rewording also docs, and make it more systematic and less relaxed/ambiguous, i.e. @ being reset to global space always, and your use of M1.LABEL inside M2 would always land as M2.M1.LABEL, requiring to use @ systematically inside other module, when accessing M1, etc... The current definition is kinda ambiguous, and the current implementation squares that ambiguity.

The @ is documented for labels, and should work across all features of sjasmplus, when label is being processed, but I'm pretty sure it does not, there may be further directives/situations where @ will fail, either due to code to handle it is completely missing, or just some bug in the implementation.

I'm not sure what you mean by "aside from labels", I'm not aware of any further @ usage. I was toying with idea about using it in macros to break out from some substitutions (i.e. at instruction part of line @DJNZ assembled as instruction even if DJNZ macro/define is defined) - but I'm not sure if that would not add again some extra ambiguity and break something, so this is just idea so far, will keep thinking about it.

So, sorry for the inconvenience, and keep reporting particular bugs as you run into them, we will try to fix each of them, but I'm also on the verge of simply rewriting it from scratch, including make-docs-wording-more-specific. Even if it would introduce slightly different behaviour in some corner cases (judging by the lack of issue reports, probably nobody exercises modules to such length :/ ).

But I also want to take some time to think about it before doing such big (backward-not-fully-compatible) change, so at this moment I'm rather checking the ZXNext "device" addition, and just collecting thoughts about that label system.

Liniya commented 5 years ago

probably nobody exercises modules to such length

This is a bit offtopic, but I thought I'd clarify just in case. The nature/number of issue reports here should ideally not be taken (as a starting point for assumptions) as being indicative of the general situation at the moment. This particular fork of sjasmplus is a relatively new addition. Personally I came to see it as the actively-developed one at present, and it promises some nice new features, this is why I see merit in following it. Other than that, not many sjasmplus users in the communities where it is the Speccy/Z80 cross-assembler of choice (e.g. the ex-USSR scene) seem to be really aware this exists.

As well, it should be all but common knowledge among most active users that this 'namespace' system is full of potential dangers/idiosyncrasies. At this point, I make use of it to some extent, because it helps to keep the labels nice and short. OTOH some people discourage its use altogether, which I suppose is also the recommended style for compatibility across the various assemblers. Overall they do have a point. It is a flawed system as it stands, there's no truly reliable way to address any label from anywhere, and there seem to be 'spesshul behaviors' at nearly every scope level. Nevertheless, this is what people have been making use of since the base version 1.07RC7 and earlier. Flawed as they are, a body of code definitely exists that makes use of these features. Let's keep things like that in mind, and thanks again for your work. ~