spotlessmind1975 / ugbasic

An isomorphic BASIC language compiler for retrocomputers
Apache License 2.0
95 stars 17 forks source link

Arrays of string not working #225

Closed Samuel-DEVULDER closed 3 years ago

Samuel-DEVULDER commented 3 years ago

Tested on MO5 (but might be present on other plateforms as well):

DIM TST$(1)
TST$(0)="tst"
PEN 2
PRINT TST$(0)
HALT

produces this empty output: dcmoto03 Notice that the colors appearing in the top of the screen are actually the "tst" bytes being written at address $0000..$0002

Samuel-DEVULDER commented 3 years ago

The generated ASM looks weird:

Here we find some handling computing TST$(0) which is stored to _Ttmp7:

    ../..
    LDD _Ttmp4
    ADDD _Ttmp6
    STD _Ttmp7
; peephole: rule #2 (STD->LDD)
    ADDD #_TST
    STD _Ttmp7

so far so good.

But next comes this:

    LDA _Ttmp7
    STA _Ttmp8

It look like we take the "high byte" of the address and store it in _Ttmp8. This is strange as an address isn't likely to be split into two halves on this machine.

    LDA _Tstr1    ;
    STA _Ttmp10 ; _tmp10 = len("tst")
    LDX #_Tstr1 
    STX _Ttmp9
    LDD _Ttmp9
    ADDD #1
    STD _Ttmp9 ; _tmp9 = address of "tst"

okay with above code, but then:

    LDB _Ttmp8
    JSR DSFREE

Oh damn !!! that upper byte is used as a string descriptor!!!

But the worse is to come:

    LDA _Ttmp10
    JSR DSALLOC
    STB _Ttmp8

Okay: we allocate len("str") a got a descriptor stored to _Ttmp8. And then:

    LDB _Tstr1 ; invalid descriptor (should be _Ttmp8)
    JSR DSDESCRIPTOR ; returns $00000
    LDD 1, X
    STD _Ttmp11 ; _Ttmp11 = NULL
    LDA , X
    STA _Ttmp12

We now treat len("tst") as a string descriptor!!! Shouldn't we have _Ttmp8 here ??

When using an invalid descriptor, the adresse returned by dsdescriptor is not the one allocated. It is just $0000 which is later sent to MEMOVE, resulting in corruption at $0000...$0002.

    LDA #0
    LDB _Ttmp10
    TFR D, U
    LDY _Ttmp9

    LDX _Ttmp11 ; X = NULL
    JSR CPUMEMMOVE
Samuel-DEVULDER commented 3 years ago

I don't think this is specific to the 6809. It seem this generated code comes from _infrastructure.c which is CPU-independent AFAIK. Unfortunately I can't really spot which part produces that strange assembly.

Samuel-DEVULDER commented 3 years ago

Okay I found the root cause for the garbage on screen: https://github.com/spotlessmind1975/ugbasic/blob/4bae698d7fbb5c31aa9fbf34462297f3c1f71578/ugbc/src/targets/common/_infrastructure.c#L3792 Here we should refer to dstring->realName and not string->realName.

Unfortunately it is not enough yet since the screen remains empty.

Samuel-DEVULDER commented 3 years ago

I also find out why we have the stage half-address code. It is possibly caused by this: https://github.com/spotlessmind1975/ugbasic/blob/4bae698d7fbb5c31aa9fbf34462297f3c1f71578/ugbc/src/targets/common/_infrastructure.c#L3770 where we copy a dstring (a pointer to a string?) into a single byte. I think something like cpu_move_8bit_indirect2() might be better suited here.

The same issue seem to be present there too: https://github.com/spotlessmind1975/ugbasic/blob/4bae698d7fbb5c31aa9fbf34462297f3c1f71578/ugbc/src/targets/common/_infrastructure.c#L3826

(unfortunately this still doesnt look enough to fix the issue)

spotlessmind1975 commented 3 years ago

@Samuel-DEVULDER thank you very much for the great analysis!

It allowed me to collect several clues to solve the problem.

[current considerations refer to commit 4bae698]

The problem appears to occur on other platforms as well. The reference platform, for my tests, is Commodore 64. The source code you propose does not give the expected results on that target, as well.

The code responsible for string array assignments is embedded in the parser (ugbc.y: 3784-3802). Once the indexes have been statically retrieved, and the standard type checks passed, the variable_move_array_string (_infrastructure.c: 3755-3795) routine is called. This takes care of retrieving the target array and source string, calculating the offset in the array, adding it to the starting address of the array and then assigning the new string.

I realized, however, that the code contains at least three errors that do not make it work correctly, and are the cause of the reported bug:

  1. wrong retrieving of dynamic string identifier: instead of recovering the identifier of the dynamic string stored in the array, in order to be able to free it, the code wrongly recovers the address of the position where the identifier is found;
  2. missing copy of identifier: once the new string is generated, the code does not copy its value into the array;
  3. The code do not remove the identifier from the temporary variable, and this puts it at risk of a "garbage collection" on the next round (because for temporary variables there is no guarantee of longevity).

In addition, the code responsible for retrieving strings from arrays is embedded in the parser (ugbc.y: 1016-1031). Once the indexes have been statically retrieved, and the standard type checks passed, the variable_move_from_array (_infrastructure.c: 3799-3875) routine is called.

Also in this case error 1) is repeated.

By fixing that, and along with your suggestions, the problem has been fixed.

Samuel-DEVULDER commented 3 years ago

Yeah I confirm this works fine for the mo5 now :)

Note:

[current considerations refer to commit 4bae698]

I think this is not the proper commit.

spotlessmind1975 commented 3 years ago

I think this is not the proper commit.

The commit is right, since it is the source code before the fixing. :)

Samuel-DEVULDER commented 3 years ago

oh well, my bad ;)