stardot / beebasm

A portable 6502 assembler with BBC Micro style syntax
http://www.retrosoftware.co.uk/wiki/index.php/BeebAsm
GNU General Public License v3.0
83 stars 26 forks source link

Support for string values and many of BBC BASIC's string functions #56

Closed mungre closed 2 years ago

mungre commented 3 years ago

The README has the new functions.

There's still no way of using a string as an assembler mnemonic or an addressing mode, which I know some people would like. This could be handled with an ASSEMBLE command which takes two strings (the mnemonic and the parameters) and assembles them. Anyway, that's something that could be built on this.

This is quite a big change so comments are very welcome.

ZornsLemma commented 3 years ago

Excellent! I haven't had a play with it yet but it sounds very promising.

ZornsLemma commented 3 years ago

Having had a very quick play with this, does command line option -D support defining string variables? I couldn't get it to work, but it may be I'm failing to get the shell escaping correct.

ZornsLemma commented 3 years ago

Another drive-by comment: It doesn't seem to like me creating string variables which end in "$". Would it be easy to allow this? I don't think it's particularly important to be like BBC BASIC and insist on a "$" suffix for string variables, but if it's simple it would be nice to allow it. The existence of TIME$ as a string variable feels slightly odd (although I appreciate it's backwards compatibility) when normal variable names can't contain $.

steven@riemann:~/src/beebasm3$ cat s1.beebasm 
foo="Hello "
bar=TIME$
mike$="X"
PRINT foo+bar
steven@riemann:~/src/beebasm3$ ./beebasm -do z.ssd -v -i s1.beebasm
s1.beebasm:3: error: Unrecognised token.

mike$="X"
^
mungre commented 3 years ago

Apart from TIME$ and functions, I think the only place you can currently have a $ after something that resembles an identifier is in things like "LDA$FE". As long as that doesn't get broken it shouldn't be hard. I'll have a look.

I meant to do something with -D but forgot. The obvious syntax is -D symbol="value" but this gets mangled by the argv parsing. On Windows you would need -D symbol=""""value"""". That is eight quotes in total and it's not portable so cross-platform makefiles would be a difficulty.

-D could always treat options as strings but this would break existing builds. If it guesses the type then this will prevent the use of strings that look like numbers. The simplest way is to introduce a new switch for string symbols, maybe -S?

mungre commented 3 years ago

Or some ad-hoc syntax like -D symbol=$value or -D$? Or require a $ on the symbol name -D symbol$=value, though I'm not superkeen on this one.

ZornsLemma commented 3 years ago

I'm not all that keen on -D symbol=$value - it just feels wrong to have stuff on the right hand side of the "=" which isn't part of the value. Then again, this really this isn't that different to the "=?" conditional assignment syntax which doesn't bother me at all. I think a stronger argument against that would be the special treatment of $ in Bourne shells for variable expansion, requiring more platform-specific escaping to work around.

Speaking of Bourne shells, I expect there'd be similar-but-different awkwardness with quotes as with your Windows example. Using a trivial C program to just dump out argv:

$ ./a.out -D foo="bar" # natural syntax doesn't give desired result
0 ./a.out
1 -D
2 foo=bar
$ ./a.out -D 'foo="bar"' # one possible unnatural syntax which does give desired result
0 ./a.out
1 -D
2 foo="bar"

-S seems like the best option, but it does seem a little bit disappointing not to be able to use -D naturally.

ZornsLemma commented 3 years ago

Just for completeness: -D could define a string symbol if and only if the argument can't be interpreted as an integer, but I think that's pretty ugly and confusing. [Edit: Sorry, just noticed you did mention this option above anyway.]

mungre commented 3 years ago

On Windows 'foo="bar"' gives 'foo=bar'. foo='bar' preserves the single quotes but I suspect it won't on Unix, and if the string contains a space you have to double-quote it as well.

I'm not keen on -S either, but I'll do that for now.

ZornsLemma commented 3 years ago

Thanks for adding those!

Another (easy - well, the coding bit, anyway :-) ) suggestion: would it be worth adding BEEBASM_VERSION$="1.10" as a default string symbol? Or maybe BEEBASM_STRING_VERSION="1.10"? And perhaps we could have a BEEBASM_INTEGER_VERSION=110 as well? I really don't know, but I thought I'd mention it. Perhaps something to discuss on stardot later rather than mix in with this.

ZornsLemma commented 3 years ago

PS Glad to see it looks like the "$" support wasn't too hard, although I guess knowing where to make the change is a big part of it and that effort isn't visible in the diff. :-)

mungre commented 3 years ago

There was a discussion about version numbers on stardot recently. All the proposed solutions (including mine) had downsides so I'm disinclined to mix that in here.

The "$" thing was easy when I realised there was already similar code for "%".

I've done a first go of the ASM thing. It's great for obfuscating code

LDA=4
ASM="LDA"
ASM ASM, "LDA"   \ LDA 4

The error reporting is terrible for now. I'm not sure how useful it is? I've put it here for now.

ZornsLemma commented 3 years ago

It's that discussion which made me mention it here, and you're right, probably best not to mix the two.

Is it correct that "LDA" is quoted in the last line of your ASM example? I'd have expected:

ASM ASM, LDA

but maybe that's part of your point about obfuscation. :-)

I am sure I've had one or two rare cases where this would have been helpful, but of course I can't think of them now. What I thought was one and started to write up here turned out to not be quite the same thing...

The best example I can think of right now would also "require" an equivalent of the C macro preprocessor # operator to work:

macro munge_thing zp
    if zp == hot_zp_1 or zp == hot_zp_2
         ; inline munging, we don't want to waste time for this hot address
        lda zp
        ; imagine a lot of code here
        sta zp
    else
        ; Performance isn't critical here and we don't want to waste space on inline munging.
        asm "jsr", "munge_"+#zp+"_subroutine"
    fi
endmacro

munge_thing hot_zp_1 ; generates inline code
munge_thing boring_zp ; generates jsr munge_boring_zp_sub

Edit: I put "require" in quotes because you could just require the caller to pass the argument as a string and use asm for all references to it inside the macro; it's just a bit nicer for the caller not to have to do this.

ZornsLemma commented 3 years ago

Aha, I think I've remembered one of the "good" examples. (YMMV!)

Imagine we've got some data in memory and in an ideal world we'd just access it normally via (zp),Y:

    lda (ptr),y
    clc:adc #42
    iny:ora (ptr),y:dey
    sta (ptr),y

Unfortunately there's a hole in the middle of the data we're addressing via ptr - in my specific case (Ozmoo for Acorn), we've got a sideways RAM bank paged in but we've got no shadow RAM, so our block of data starts at (say) &6000 and stretches up to &C000, but we need to "ignore" the 1K of mode 7 screen at &7C00.

What was "lda (ptr),y" now turns into something like (assuming ptr is page-aligned, just to keep things simpler):

    lda ptr+1:cmp #&7c:bcc below_screen
    clc:adc #4
.below_screen
    sta temp_ptr+1
    lda ptr:sta temp_ptr
    lda (temp_ptr),y

and we need to do something very similar for "sta (ptr),y" and "ora (ptr),y" and "and (ptr),y" etc. We can wrap the above code inside an "lda_zp_y_skipping_screen" macro, but we need to duplicate the screen-skipping logic in ora_zp_y_skipping_screen and and_zp_y_skipping_screen etc. With something like your ASM, we can write:

macro do_zp_y_skipping_screen opcode, zp
    if opcode != "lda"
        pha
    fi
    lda zp+1:cmp #&7c:bcc below_screen
    clc:adc #4 
.below_screen
    sta temp_ptr+1
    lda zp:sta temp_ptr
    if opcode != "lda"
        pla
    fi
    asm opcode, temp_ptr
endmacro

and then do:

    do_zp_y_skipping_screen "lda", ptr
    clc:adc #42
    iny:do_zp_y_skipping_screen "ora", ptr:dey
    do_zp_y_skipping_screen "sta", ptr

Maybe this isn't that convincing, I don't know. As I've written it here you could probably factor the skipping screen logic out into a helper macro to avoid a lot of the duplication even if you didn't have your new ASM instruction, but once the screen skipping logic gets more intricate I think the ASM advantage grows.

If the ASM implementation doesn't significantly complicate beebasm, I think it's probably a nice thing to have, even if its use allows programs written using beebasm to be confusing.

ZornsLemma commented 3 years ago

Sorry for the comment spam, and for not testing this myself (yet)...

Does it work if you do:

foo=&70
asm "lda", "foo+1" ; lda &71

If so, does this mean we could/should support EVAL() as a string function? If not, does it matter if this works or not?

mungre commented 3 years ago

I'm sure it's right because it runs! ASM takes two strings and LDA is a number so it needs to be converted to a string or quoted. ASM evaluates the contents of the second parameter, which can include symbol names. The implementation is in the mungre/asm branch..

If I added BASIC's EVAL function you could pass "zp" as a string parameter and write

macro munge_thing zp
    if zp == hot_zp_1 or zp == hot_zp_2
         ; inline munging, we don't want to waste time for this hot address
        lda EVAL(zp)
        ; imagine a lot of code here
        sta EVAL(zp)
    else
        ; Performance isn't critical here and we don't want to waste space on inline munging.
        asm "jsr", "munge_"+zp+"_subroutine"
    fi
endmacro

And you've just come to the same place! Yes, the "foo+1" does work. EVAL should be straightforward. I'll give it a go.

mungre commented 3 years ago

Also, thanks for the "good" example. I didn't want to shove ASM in just because I could, but it does look like it will be useful.

ZornsLemma commented 3 years ago

Thanks for the explanation on ASM. I've now built the code and you're right that it works, but I find it very confusing. It seems like the meaning of a quoted expression means something different in this context to elsewhere:

.START
VAL=4
ASM "LDA", "VAL" ; LDA 4
ASM "LDA", "VAL+1" ; LDA 5
EQUS "VAL" ; emits 'V', 'A', 'L', not '4'
EQUS "VAL+1" ; emits 'V', 'A', 'L', '+', '1', not '5'
.END
SAVE "X", START, END

I would really have expected to need to write:

ASM "LDA", STR$(VAL)

I wonder if my mental model of this is just wrong, but I hope you can at least see where I'm coming from, even if you don't agree with me.

On a possibly awkward tangent, I note that:

ASM "LDA #", "4"

doesn't work as I might wish. (ASM "LDA", "#4" does work.) Would it be hard to make this work? Edit: I suspect the answer is yes, otherwise ASM could take a single argument and we'd do things like ASM "LDA "+STR$(VAL) or ASM "LD"+register+" #42"

ZornsLemma commented 3 years ago

PS At the risk of being too demanding, how would you feel about adding UPPER() and LOWER() functions for string case conversion? In BBC BASIC you could write a function to do this, but you can't in beebasm. One example use would be in the if opcode == "lda" case in my "good" example, where opcode might in fact be LDA or LdA etc.

ZornsLemma commented 3 years ago

PPS I'm sort of getting my head round the second argument to ASM, having re-read your comment about about it evaluating its second argument. It still feels more natural to me to use it by writing something like ASM "LDA", STR$(VAL) but I can see why it "just works" to write ASM "LDA", "VAL". I think. My head is spinning a little, but I did want to backtrack on my earlier comment a little...

Edit: Maybe ASM's second argument should be a numeric value, and if you want evaluation you write EVAL(whatever)? I am genuinely not sure if that's a good idea even in my own opinion.

mungre commented 3 years ago

ASM passes the contents of the second parameter to something like EVAL. But you couldn't actually use EVAL because it needs to handle all the addressing mode syntax.

I thought it would be hard to use one parameter, but now I've done two I see it wouldn't be. I think the syntax would be much less confusing with a single parameter because it would obviously be a line of assembly language in a string, not something which could be evaluated.

I'll change it to use a single parameter.

mungre commented 3 years ago

Also UPPER and LOWER would be very simple. Will do.

ZornsLemma commented 3 years ago

A single parameter would be brilliant, thank you! I was slowly coming to the addressing mode realisation myself as I mulled over my suggestion to make the second argument numeric.

It's quite exciting to think how powerful this could be. (And we can have a competition on stardot to abuse that power in creative and terrible ways. :-) )

mungre commented 3 years ago

I think we really need some kind of recursion to unleash the full horrifying potential. Named user defined functions with a single expression for the body would be fairly simple. With the addition of "if" expressions they would be Turing-complete. But maybe see how the existing features get abused first.

Anyway, the ASM code was actually simpler with a single parameter! It's a better design all round, thank you.

I think that's everything except better error messages for ASM and EVAL. I'm hoping to co-opt the macro error-reporting, but I haven't looked into it yet.

mungre commented 3 years ago

The error messages already show the file and line number of the line with the ASM or EVAL, plus the actual expression it's trying to evaluate. The macro error stuff just shows the line numbers, so it wouldn't add anything. I'm not sure where to go with this, if anywhere.

I've put all the changes into this branch. It's done for the moment.

ZornsLemma commented 3 years ago

Thanks! I've had a little play with the latest version of the mungre/asm branch, is this a bug or am I doing something wrong?

steven@riemann:~/src/beebasm3$ cat s4.beebasm 
quote=""""
print quote
foo="foo"
print quote+foo+quote
print """"+foo+""""
steven@riemann:~/src/beebasm3$ ./beebasm -do z.ssd -i s4.beebasm
s4.beebasm:5: error: Missing comma.

print """"+foo+""""
        ^
ZornsLemma commented 3 years ago

On a different note, it doesn't look like STR$~() to generate hex representations of integers is supported. Would you be willing to add this, or perhaps a HEX() function to do the same job? If it needs a motivating example, code being written to run on a 6502 second processor might wish to do something like:

    LDX #<gocmd
    LDY #>gocmd
    JSR OSCLI

.restart_here
...

.gocmd
    EQUS "GO "+STR$~(restart_here), 13

in order to set the address it will be re-entered at on soft break.

mungre commented 3 years ago

That was a good catch! PRINT, PUTBASIC, PUTFILE, SAVE, etc. all had ad hoc string parsers so they couldn't use the new syntax.

I've added STR$~() and renamed UPPER and LOWER to UPPER$ and LOWER$ for consistency with all the other string functions.

ZornsLemma commented 3 years ago

Thanks! Good idea to put the $ on UPPER and LOWER too.

I have built the latest mungre/strings and given it a very quick go. I plan to build some of my beebasm-using projects with the new version at some point, I just haven't done it yet.

I know I'm being cheeky here, but while I'm not too surprised this doesn't work, the error seems a bit confusing:

steven@riemann:~/src/beebasm3$ cat s5.beebasm 
asm "lda #42"
print "asm ""lda #42"""
asm   "asm ""lda #42"""
steven@riemann:~/src/beebasm3$ ./beebasm -do z.ssd -i s5.beebasm
s5.beebasm:3: error: Unrecognised token.

asm   "asm ""lda #42"""
                       ^

Am I missing something which would make this a reasonable response? I don't know if it would be easy to improve this, but I thought I'd ask...

mungre commented 3 years ago

That's poor in two ways. A missing instruction error (unlike an operand error) was reported in the parent context rather than in the context of the expression, and I'd been too lazy to add a specific error message. Should now read:

s5.beebasm:3: error: Expected an assembly language instruction.

asm "lda #42"
^
mungre commented 3 years ago

Testing wise, there have been random things plus I've built David Given's bogomandel and my chuckie. They produce identical ssds. I've exercised all the string parsing changes done today and I use this as a sanity check:

foo="Foo"
bar="Bar"
Foo="fooled!"
ASSERT foo + " " + bar = "Foo Bar"
ASSERT MID$(bar, 2, 1) = "a"
ASSERT UPPER$(foo) = "FOO"
ASSERT LOWER$(bar) = "bar"
ASSERT VAL("7.2") = 7.2
ASSERT EVAL(foo) = "fooled!"
ASSERT STR$(7.2) = "7.2"
ASSERT STR$~(27.3) = "1B"
ASSERT LEN("") = 0
ASSERT LEN(foo) = 3
ASSERT LEN("a""b") = 3
ASSERT CHR$(65) = "A"
ASSERT ASC(foo) = 70
ASSERT ASC(MID$("a""b", 2, 1)) = '"'
ASSERT STRING$(3, bar) = bar + bar + bar
ASSERT TIME$("%a,%d %b %Y.%H:%M:%S") = TIME$

SAVE "tests", 0, 0
ZornsLemma commented 3 years ago

Thanks for taking a look at that. It might be worth putting that test/example code in the examples directory in the repo - when I've had a test case to reproduce a bug I've generally dumped it in there for future reference. (At some point it might be nice to turn all these into an automated test suite and perhaps tidy them up a bit.)

I think I've found another bug from trying to build one of my other projects:

steven@riemann:~/src/beebasm3$ cat s6.beebasm 
.start
foo = "hello"
equs foo, 13
equs TIME$, 13
.end
save "X", start, end
steven@riemann:~/src/beebasm3$ ./beebasm -do z.ssd -i s6.beebasm
s6.beebasm:4: error: Bad expression.

equs TIME$, 13
            ^
mungre commented 3 years ago

Cheers, that is a bug. The TIME$ parsing was eating an extra character.

Yes, automating some tests is on my list of things I might get around to one day.

ZornsLemma commented 3 years ago

Thanks for fixing that!

I suspect the big problem with automating the tests would be doing it in a cross-platform way. In a simple-ish project of mine I just wrote a shell script to do some basic tests and diff the output against a master set of output, but for beebasm that's not likely to be acceptable. Edit: Just maybe the github automated test stuff which we recently started using has something that could help. Anyway, just a thought, there's no urgency to work on this of course.

I think I've found another bug, probably a bit obscure but I do have a project using this... The README says that when using the "-o" option to assemble in a more traditional fashion instead of generating a disc image, SAVE doesn't need a filename. This doesn't seem to work any more:

steven@riemann:~/src/beebasm3$ cat save2.beebasm 
org &900

.start
    lda #42
    jmp &ffee
.end

SAVE start, end
steven@riemann:~/src/beebasm3$ ./beebasm -o save2.out -v -i save2.beebasm
save2.beebasm:8: error: Type mismatch.

SAVE start, end
          ^
mungre commented 3 years ago

Thanks, I made the rookie mistake of believing the syntax description in a comment.

The cross-platform thing is what has put me off. chuckie checks that the ssd hasn't changed but it uses a batch file.

I think all of the positive (i.e. non-failing) tests can be expressed as "is this ssd unchanged?" Windows and Unix both have binary compare tools (comp and cmp respectively). A makefile can test its platform by testing environment variables, and then choose the right command. Well, probably, nmake is really limited.

A much simpler option might be to include this in beebasm. Provide a "reference ssd" on the command-line and check that the new ssd is identical. Fail if not. This doesn't feel like it really belongs there, but it would be handy.

Edit: of course, there are a few things (like SAVE to the file system) that couldn't be tested with this, but it covers the vast majority. It is scruffy though.

ZornsLemma commented 3 years ago

:-)

I'm always a bit dubious about my ability to write portable makefiles (Unix variants are bad enough, even without then supporting Windows), but I'm sure what you say is right. It would be a shame not to include the "deliberately failing" tests but something would be better than nothing, I guess.

https://www.freecodecamp.org/news/what-are-github-actions-and-how-can-you-automate-tests-and-slack-notifications/#part-1-automating-tests suggests the github actions stuff (which we started using when I merged Dave Lambley's pull request #54 to proposed-updates) might offer something here. But I must admit this is an area of github I know very little about; Dave did the work to start using this feature and it seems pretty nifty, but that's about all I can say. (Edit: That link is probably a bit irrelevant, beyond showing the concept of github running tests exists - it's a bit Node.js-y...)

I have built all of my own projects I can think of with the latest mungre/strings code and they all seem to build fine. However, I think there is still a lingering bug because I can't build Prince of Persia (https://github.com/kieranhj/pop-beeb). The build there is set up for Windows and it may be some of my hacks to build it on Linux are actually at fault, but could you please give it a go and see if it works for you? I get:

game/grafix.asm:111: error: Fatal error: the second assembler pass has generated different code to the first.

.chtablist_LO EQUB LO(chtable1),LO(chtable2),LO(chtable3),LO(chtable4)
 ^

Call stack:
pop-beeb.asm:910
mungre commented 3 years ago

It is broken, I'll have a look. For future reference, if you put EQUB &18,&04,&01,&20,&04,&6B,&63 in your version.txt you get an identical ssd.

I don't know anything much about github actions either. Fundamentally, it seems like it just coordinates running stuff. That could be a test, but it doesn't know anything about it.

I suppose a common choice for sane cross-platform scripting is python. It's not standard on Windows but I guess it's pretty common for even non-Python developers (like me) to have it installed. It would be easy to write a bit of Python to find and run tests, and it should be portable.

ZornsLemma commented 3 years ago

Python sounds like a pretty good plan - and as long as the standard build either doesn't run the tests or gracefully skips them if Python isn't installed, it wouldn't be the end of the world if someone didn't have it. You are probably right about how github actions work, but it does look as though Python is available (https://github.com/marketplace/actions/setup-python) so if we did have a Python test harness it could probably be integrated there fairly easily.

Thanks for taking a look at PoP, I hope it's not a tricky one...

mungre commented 3 years ago

Thanks for finding that. The hairiest change was the one to support multiple parameters in functions so that's exactly where it was. I should have looked there first.

It was failing to complete an EQUB list (so everything moved in memory) when undefined symbols were used in a function call. The commit has more details.

ZornsLemma commented 3 years ago

Nice one, with the latest code all of my projects build fine! Thanks very much.

Are you going to post on stardot about this - I'd expect a pretty enthusiastic response :-) - and ask for more testers or have you had enough of fixing bugs for one weekend? :-) (If you do post, it might be worth mentioning your branch also includes all the existing proposed-updates changes.)

I haven't looked over the code yet, I will see if I can do so over the next few days just on the off-chance there's anything dodgy that might catch my eye.

ZornsLemma commented 3 years ago

PS Don't forget to credit yourself for this work in README.md! :-)

mungre commented 3 years ago

Thank you. Your suggestions and testing have been really helpful. It's much better than it was two days ago.

There's no rush to have a look at the code.

I will post on stardot. It seems pretty solid now...

I've committed an exe so people can try it easily. I don't like cluttering up the repo with multiple exes so I've put it in mungre/strings-exe, which can be deleted later.

mungre commented 3 years ago

I wrote a simple test runner in python. It works on Windows and Linux. The tests are only the existing examples. There is a README explaining what it does.

ZornsLemma commented 3 years ago

Nice one! I've had a quick look at this and it seems like a pretty decent framework.

Would it be worth opening a new pull request for this just so the discussion doesn't get mixed in with the string value support? It probably doesn't matter if it's just the two of us talking about it. :-)

A few miscellaneous thoughts; I appreciate you've probably thought of some or all of these yourself and just haven't wanted to deal with them :-) :

ZornsLemma commented 3 years ago

PS FWIW the test harness worked fine for me on Ubuntu 20.04.2.

ZornsLemma commented 3 years ago

PPS If it's easy, it would maybe be nice to support both Python 2 and 3. But it makes perfect sense to not worry about this yet, and maybe once the test harness is "done" and no longer being modified a lot it could be tweaked to work on both.

mungre commented 3 years ago

Discussion of testing moved here.

ZornsLemma commented 2 years ago

I've been over the diff between proposed-updates and mungre/strings, not in great depth. I have a few thoughts, which I hope (but don't guarantee!) aren't based on misconceptions:

ZornsLemma commented 2 years ago

Glancing over the open issues, I think this work addresses issue https://github.com/stardot/beebasm/issues/33. I just gave this a quick test and macros certainly seem to support string arguments, am I right?

mungre commented 2 years ago

Cheers, I've fixed STRINGS$ in the README.

Yes, macros support string parameters and the ASM directive handles the comment on issue 33.

Yes, shifting of signed ints is undefined for many values. Funnily enough, I'd just noticed the same problem in the binary number parser. The BASIC V shifts are described here. It has both logical and arithmetic shifts right. This seems like a reasonable thing to copy. I'll do that.

LEFT$ and RIGHT$ were skipped as unnecessary, but I might as well finish the job and add them in.

mungre commented 2 years ago

Further to my last commit, I'm not sure what local-forward-branch-3 is testing.

I've been adding some simple tests of value parsing and noticed these things:

I expected print to write 32-bit integers cleanly but print(&7fffffff) produces 2.14748e+009. Can you see any harm in changing this (for print and STR$)?

I was surprised that &ffffffff isn't negative, but it's always been like that so it's too late to change it. However the % parsing can produce negative numbers, and quietly truncates the value to 32 bits (undefined behaviour notwithstanding). The silent truncation can go but I'm not sure about making the negative behaviour consistent with the hex parsing. What do you think?

Edit: this comment was supposed to be on the testing pr so I've repeated it there.