Figure out which assembler to switch to

scotws / TaliForth2

A Subroutine Threaded Code (STC) ANS-like Forth for the 65c02

Other

86 stars 23 forks source link

Figure out which assembler to switch to #250

Closed scotws closed 4 years ago

scotws commented 4 years ago

Create a list of assemblers for the 65C02 that have conditional assembly, are free / open source and will run on multiple platforms. Chose one. If all fails (unlikely), consider creating a new one from scratch.

scotws commented 4 years ago

What I have found so far:

Tass64 - https://sourceforge.net/projects/tass64/ (see http://tass64.sourceforge.net/ for documenation). Open source C, conditional assembly.
as65 - http://www.obelisk.me.uk/dev65/as65.html in Java. Includes a linker.

SamCoVT commented 4 years ago

My nomination would be for cc65. https://cc65.github.io/

It has a C compiler, assembler, and linker, but the assembler/linker can be used independently. It has conditional assembly built into the assembler and supports 65C02 opcodes. It can be installed on Linux by just asking the package manager, and there are executables for Windows available.

The downside: You need to create a configuration file for the linker so it knows about the memory map of your system. It took me a little while to get this right for my SBC and it's not exactly trivial, but I was using the C compiler at the time. I think the config could be much simpler if just using the assembler.

Looking over the others you've suggested, Tass64 might be a better choice for an assembly-only project like Tali2. I'll check it out and let you know what I think.

scotws commented 4 years ago

What I would love, but can't seem to find, is an assembler that is "zero page aware", that is, helps populate the ZP. I'm getting tired of manually managing it -- it would be nice if I could just define something as ZP, say a directive like .savezero counter 2 and then the assembler allocates space and keep track.

I have trouble thinking this should be that hard - if you see the ZP as basically a form of registers, there must be tons of literature about this for compilers for register allocation. It's probably a bit trickier because we have some parts that have to be allocated as a block - in our case, of course, the Data Stack - but this still should be highly doable.

SamCoVT commented 4 years ago

That would be another vote for cc65 then - it supports "segments" (it's why you need a config file in the first place) and zeropage is one of the segments. You can say:

   .segment "ZP" ; Switch to zero page
myvariable: .res 2 ; Reserve 2 bytes of uninitialized storage in zero page named myvariable
   .segment "ROM" ; Switch back to ROM

The config file tells the linker that ZP is from $0000-$00ff (and this makes it very easy for people to move Tali's ZP variables around by just changing the linker config file). Essentially the linker config would replace the addresses we put into the platform file now, and only the init and I/O code would need to be in the platform file.

After reading through some of the "assembler showdown" thread on 6502.org, I discovered that cc65 can be used without a linker config file, and it will follow the .org addresses you give it. That would let us port Tali the way it is now, with lables and .org statements controlling the location of things.

We may actually want to use the config file, though, to make it easier to move certain parts of Tali to certain places without having to chase down a bunch of .org statements and also to allow the above behavior of auto-allocating the next ZP location. That's the benefit of having an assembler with a linker - the linker will figure out the actual addresses as it goes to put together the executable.

SamCoVT commented 4 years ago

Both CC65 and Tass support anonymous labels, although I think CC65's version will be much easier to "translate" to. Tass' require you to know what direction you are jumping TO the label. Here is a snippet from the Tass manual:

Anonymous symbols

Anonymous symbols don't have a unique name and are always called as a single plus or minus sign. They are also called as forward (+) and backward (−) references.

When referencing them "−" means the first backward, "−−" means the second backwards and so on. It's the same for forward, but with "+". In expressions it may be necessary to put them into brackets.

        ldy #4
-       ldx #0
-       txa
        cmp #3
    bcc +
        adc #44
+       sta $400,x
    inx
    bne -
    dey
    bne --

The bcc + will only branch to a + label and the bne - will only go to a - label.

CC65 is much closer to what Tali is using now. The symbols are a little different, but I think our "transmogrifier" (much fancier sounding than converter) program will be able to automatically convert the syntax:

6.6 Unnamed labels

If you really want to write messy code, there are also unnamed labels. These labels do not have a name (you guessed that already, didn't you?). A colon is used to mark the absence of the name.

Unnamed labels may be accessed by using the colon plus several minus or plus characters as a label designator. Using the '-' characters will create a back reference (use the n'th label backwards), using '+' will create a forward reference (use the n'th label in forward direction). An example will help to understand this:

            :       lda     (ptr1),y        ; #1
                    cmp     (ptr2),y
                    bne     :+              ; -> #2
                    tax
                    beq     :+++            ; -> #4
                    iny
                    bne     :-              ; -> #1
                    inc     ptr1+1
                    inc     ptr2+1
                    bne     :-              ; -> #1

            :       bcs     :+              ; #2 -> #3
                    ldx     #$FF
                    rts

            :       ldx     #$01            ; #3
            :       rts                     ; #4

Both of these assemblers support jumping futher away with things like +++, but we don't need that as all of Tali's code jumps no further than one unnamed label away.

Both of these assemblers appear to have some method of scope and making local symbols. Tass uses an underscore, just like Ophis, and cc65 uses an @, although it's configurable. I'd vote for leaving it at default and changing all the code to @ on the front if we go with cc65 (the less special configuration, the better). Again, that should be easy for the transmogrifier program.

Are there any other "special" features that we use in Ophis?

scotws commented 4 years ago

cc65 and Tass look very good - lots of features, both with segments including ZP (http://tass64.sourceforge.net/#sections for Tass), both with free/open licenses, both currently maintained, both seem to work with Windows and Linux.

Cc65 has a C compiler, where Tass does not.

However, Tass has s28 exports, which might be useful to some people. It also has "uninitialized memory" (http://tass64.sourceforge.net/#uninitialized) which might be useful for people who merge Tali with their existing systems (like Steckschwein). Also, the .ptext directive could be seriously nice in a Forth context (http://tass64.sourceforge.net/#data-text).

At the moment I would tend slightly more towards Tass, though it might be more work to port it initially. Let me see if I can test it in the next couple of days.

scotws commented 4 years ago

So I've fooled around with the assemblers for a bit and would tend to prefer 64Tass, though only very slightly, and there are a bunch of gotchas.

First, what is nice are functions such as .cwarn for conditional warnings and .rta for automatically calculating the return addresses for a rts jump. Add to this the already mentioned .ptext directive etc and uninitialized memory.

What took me a while to figure out about 64Tass is that it produces CBM binaries by default which include the address at the beginning of the binary. You need to call it with -b, oh and -a for the correct character encoding. Probably would mean assembling stuff with a script. Labels don't need a colon, but to respect it, and I would want to keep colons to make it easier to convert to other assemblers.

The really big problem with 64Tass is that it will be a lot more work to port. I don't really mind if it is worth the result, and there do seem to be (slightly) more functions -- I got the feeling that cc65 is focused more on the C compiler and linker, which is fair enough, but we just need an assembler. We just need to be very sure we want to pick this one.

Any thoughts?

scotws commented 4 years ago

As a more advanced test of 64tass, I started with (admittedly easy) strings.asm as a conversion. The code is a lot cleaner in the sense that there is less visual noise: envs_rsc: .ptext "RETURN-STACK-CELLS" instead of having to count the number of chars by hand and adding them to the front of the line with a .byte directive. There are a lot of .alias that have to be taken care of, but that part is a simple Python script:

    for l in src:
        cl = l.strip()

        if cl.startswith('.alias'):
            _, name, value = cl.split()
            print(name, '=', value)
        else:
            print(l, end='')

(I'm sure awk would do this even quicker, but Python tends to be my hammer that lets me see nails all over.) The tricky parts are going to be the local labels, scopes and anonymous labels, which I sort of regret having ever used now.

SamCoVT commented 4 years ago

Just FYI, I have an almost-complete port to cc65 (eg. it assembles with no errors and generates code). I record and playback macros in my editor to do most of the work.
The unforseen snafu with cc65 is that it appears that you can't get it to fill unused memory locations without creating a config file (I was trying to use the -t none option on the command line to avoid the need for a config file). The generated output is then just sequential, with the addresses responding to the .org statement, but no filler zeroes in the data file (meaning it won't just program into a ROM or load into a simulator correctly). I need more time to play with the config files to see if I can get the vectors and "kernel" code to go to the right places. It currently outputs a 22Kish binary file, with the kernel and IRQ/reset veectors just lumped on at the end. All the stuff that starts at $8000 is in the right place, but the user's kernel and the vectors are not in the right place because of the missing filler bytes. There is a "fill" option if you use a config file, but I see no way to do it directly from the command line.

There were a bunch of issues with .scope - cc65 has .scope and .endscope (the latter is just a search/replace for Ophis' .scend), but it doesn't let you easily get to labels inside of a scope, and many of the words had their xt_label AFTER the .scope, effectively hiding them. I had to swap the lines so .scope came after the xt_label to make things work. The ending of the scope had the same issue on many words where the z_label was hidden by the scope. To be honest, I actually REMOVED the .scope on any words that did not have local symbols - many didn't have any branches in them at all. Words that have separate "helper" words also had issues with scope, especially when multiple words used the same helper - the helper's labels needed to be visible. Also an issue were words that jump into another word.

ca65 (the assembler) also likes character constants to have an ending single quote, but I found (after fixing them all) that there is an option to accept characters with only the first single quote.

I have no issues porting the whole thing over to Tass as well (and was actually planning to "try before we buy" on our short list of possible assemblers) just to see what all of the unknown issues are. I created a branch called "transmogrify/cc65" in my TaliForth2 project and can make a separate "transmogrify/tass" branch so that we can compare the good and bad things about each choice. I haven't updated the makefile yet (because I was trying to figure out how to do it all on the command line first), but I can do that once I get the config file written for py65. The goal is to get a version where I can just say "make" and have it generate the taliforth-py65mon.bin file ready for use with py65mon.

It sounds like you've made some initial progress, but if you want I can just brute force my way through with my editor and get it done. I'd prefer to not have the complexity of a separate linker config file for a new user to have to deal with, so that may seal the deal in favor of tass. The makefile can hide any unsightly command line options required, and it looks like tass has everything we need. The only downside to tass I see so far is that there is no package for ubuntu (but there is for fedora, oddly enough), however it was super easy to compile (just unzip and run make) and there are windows binaries available.

SamCoVT commented 4 years ago

I can also completely remove the scope, local labels, and anonymous labels if you would like, and change everything over to normal labels - it would make Tali much easier to port between assemblers. Those were the things I had to touch the most when porting from Ophis to cc65.

scotws commented 4 years ago

Oh wow, great work! You have certainly progressed beyond where I am, and since I'm going to spend this weekend working, won't make much progress for the next couple of days. I'm sure a detailed comparison would be something the 6502.org people would love to hear about. Once thing is clear, all of these assemblers are a big step up from Ophis, and I'm rather regretting not having done more research before just picking it back then. There is a moral here somewhere ...

I agree that the linker sounds like added complexity, and that we should make it as easy as possible for other people to port Tali to other assemblers. My suggestion would be to just change everything you think makes sense in that regard so somebody's life is easier in the future. You know, when they port the 6502 to quantum computers and need a new assembler.

Thanks again for all the work!

SamCoVT commented 4 years ago

OK, the two versions for you to try out are available at: https://github.com/SamCoVT/TaliForth2/tree/transmogrify/cc65 and https://github.com/SamCoVT/TaliForth2/tree/transmogrify/64tass

Both versions assemble (just run make taliforth-py65mon.bin -B to force an assembly), but the docs fail to generate properly because the output files with the labels are different than your python scripts are expecting.

CC65 Version

The cc65 version uses cc65's .scope everywhere to hide the local labels. cc65's scope totally hides the labels inside the scope, so the .scope has to come AFTER the xt_xxxx label and some shenanigans were played in a few spots to make everything resolve. CC65 does use "@" in front of a label to make a cheap local label, but I didn't switch everything over. The no-name labels just needed to have * changed to : and bra + changed to bra :+, etc.

The labelmap and listing files have a different format than Ophis, but look reasonable to work with. See docs/py65mon-*.txt. You should also check out the config file (platform/py65mon.cfg) - it's not as bad as I was originally thinking, and has the potential to make Tali much more configurable in terms of what goes where.

64TASS Version

In the 64tass version, I remove all scope. Tass has a local scope for labels that start with underscores (only looks between real labels) with no directives. The downside is that if there are real labels, the local labels need to be made into real labels. ed needed some work in that regard. I prepended the name of the word to local labels that needed to be real labels.
Tass also need the no-name labels to have a matching symbol, eg the * for Ophis needs to change to a - if it's a bra - and a + in order to match a bra +. Tass' labelmap is in a different format, but I believe it would be workable. The listing format is different, but also workable.

Check these out and see what you think. I haven't tried conditional compilation in each of these, but that's next on my list when I have some more free time.

scotws commented 4 years ago

@SamCoVT First, thank you again for all the work!

Looking through the files, I still think that 64tass is just much cleaner and easier to read, and that is even before we apply special stuff like .null and .ptext (which can replace a lot of the .text in the strings.asm file and get rid of the error-prone manual counting of characters).

Actually seeing the configuration file for cc65 makes me wonder if the added flexibility is worth forcing people to figure out one further thing; however, you do make a good point about configuration. If we document the living daylights out of it, could this be a Good Thing because it would be a central place for people to describe their hardware? Like, do this at the beginning of your configuration file?

(As a far future project, I'm wondering if that would allow some sort of GUI-interfaced setup for the hardware, where you could drag and mark memory regions and then the program would create this configuration file based on that. A list on one side would let you click on the modules you want to include, and it would show you how much space is used. Just as an idea.)

I'm still tending towards 64tass for the clean code, though I think your argument about using the configuration file would be a strong one. You're now far ahead of me, is there any great difference in conditional compilation?

Thanks again!

aniou commented 4 years ago

Hi, Some time ago I made a quick-and-mostly-automatic conversion of LiaraForth to 64tass (there was some semi-automatic work with labels, as far as I can remember). Finally I was able to produce code byte-to-byte identical to Liara compiled with Tinkerers assembler.

Personally I rather prefer 64tass due to flexibility and simplicity, but it is a matter of personal taste.

Maybe You find that info useful.

repo: https://github.com/aniou/LiaraForth/

scotws commented 4 years ago

@aniou Nice! Glad to hear somebody is getting use out of Liara, I might actually backport your version to the main one at some point. Also, nice work on the emulator. We need more of those if the 65816 is going to get any love ...

At the moment, I like the idea that @SamCoVT had about creating a configuration file (cc65), though I agree that 64tass seems to be more powerful. Sam has the most experience by now, so I'm going to wait for what his suggestions are on conditional compilation. This should hid the fact that I have absolutely no experience with that :-).

scotws commented 4 years ago

In case somebody is trying to follow the discussion, the syntax for the cc65 configuration file is at https://cc65.github.io/doc/ld65.html#s5

SamCoVT commented 4 years ago

I've been playing with the conditional compilation of cc65 and reading about the conditional compilation of 64tass.

I think that ultimately both will do what we need. While I do like the config file for cc65 to help specify a memory layout, I'm not excited by the fact that you MUST create a config file to get usable output. This is mainly due to the fact that without a config file, cc65 will not "fill" unused bytes, and Tali has some empty space.

I was hoping to play with 64tass' conditional compilation as well this weekend, but I've caught a cold and it's seriously reducing my brain power, so I think that will have to wait until later this week or next weekend.

The main difficulty that needs a good solution is the dictionary. To remove a word requires modifying two words - the word you want to remove and the word that comes before it in the list (and links to the nt of this word). Here's an example of that from my playing with cc65:

nt_cold:
        .byte 4, 0
        .word nt_bye, xt_cold, z_cold
        .byte "cold"

.ifndef NO_ED
nt_ed:                  ; ed6502
        .byte 2, NN
        .word nt_cold, xt_ed, z_ed
        .byte "ed"

.endif

nt_see: .byte 3, NN
.ifndef NO_ED
        .word nt_ed
.else
        .word nt_cold
.endif
        .word xt_see, z_see
        .byte "see"

I believe I'm going to want to reverse the order of the words in headers.asm and use local labels. I think this is where we might get a clear winner/loser as the two assemblers handle local labels differently. An example for cc65 looks like:

nt_editor_l:
        .byte 1, 0
        .word :+, xt_editor_l, z_editor_l
        .byte "l"
:               
.ifndef NO_EL
nt_editor_el:
        .byte 2, 0
        .word :+, xt_editor_el, z_editor_el
        .byte "el"
:               
.endif

The editor wordlist was short, so I reversed the order and tried this out and it seems to work. In the platform file, I can just say NO_EL=1 and now that word is gone when I recompile. There will, of course, be a matching conditional compilation of the word itself over in native_words.asm, but that's easier than the dictionary part. This takes care of the case of two words in a row being removed, as it will always link to the "next" word in the file that is actually compiled.

I have to try this trick with 64tass to see if it works there as well. If it does, I will likely recommend 64tass as the assembler to use.

SamCoVT commented 4 years ago

64tass has the support we need, but it's a little odd. The main issue is that it doesn't have a way to see if a label is defined (see the .ifndef (if not defined) above that cc65 supports) and all labels used in an expression must be assigned. It does have a way to assign/create a variable if it doesn't already exist, so that can be used, but it's a little extra work (and likely to be non-portable to other assemblers)

I did essentially the same thing that worked with cc65, which was reordering the headers so they linked top to bottom and then using conditional compilation to remove certain words. I also tested nested conditional compilation, which appears to work properly. Uncommenting one of the assignments at the top will remove either the entire editor wordlist or just the word line (I skipped over o as I wanted a word in the middle of the list). Note the use of :?= which assigns a value only if it's not assigned. If we don't want these all over the file, we can make a words.cfg or similar file that will have all the possible things you can turn on/off in one place with them all defaulted to be compiled in. I don't think the platform file is the right place to do that, but a separate config file would work well. If all of the "options" are defined there, we wouldn't need to use the :?= operator shown below and could just use the .if and .endif directives.


; INCLUDE_EDITOR_ALL := false
; INCLUDE_EDITOR_LINE := false

editor_dictionary_start:
INCLUDE_EDITOR_ALL :?= true
.if INCLUDE_EDITOR_ALL
nt_editor_o:
        .byte 1, 0
        .word +, xt_editor_o, z_editor_o
        .text "o"
+
INCLUDE_EDITOR_LINE :?= true
.if INCLUDE_EDITOR_LINE
nt_editor_line:
        .byte 4, UF
        .word +, xt_editor_line, z_editor_line
        .text "line"
+
.endif
nt_editor_l:
        .byte 1, 0
        .word nt_editor_el, xt_editor_l, z_editor_l
        .text "l"

nt_editor_el:
        .byte 2, 0
        .word nt_editor_erase_screen, xt_editor_el, z_editor_el
        .text "el"

nt_editor_erase_screen:
        .byte 12, 0
        .word nt_editor_enter_screen, xt_editor_erase_screen, z_editor_erase_screen
        .text "erase-screen"

nt_editor_enter_screen:
        .byte 12, 0
        .word 0000, xt_editor_enter_screen, z_editor_enter_screen
        .text "enter-screen"
.endif ; end of INCLUDE_EDITOR_ALL

; END of EDITOR-WORDLIST

I still need to figure out how to handle the word at the end of the list (with the NULL pointer for the link to the next word). We might need to insert a dummy word (perhaps one with a zero-length name) so there is always a word at the end of the list. I don't know if our current code can handle a zero-length word, but we can make it work if that's the path we decide to take.

I'm going to vote in favor of 64tass as it is simpler than cc65. As a side note, it also can handle "sections" which provide behavior similar to the linker of cc65. The config for those can be included right in the assembler code.

scotws commented 4 years ago

Hope you feel better after your cold! I've been trying to sit on my hands as not to influence you, but I agree that 64tass is a better fit. Even if the linker and the configuration file of cc65 turn out to be more powerful, one idea behind Tali is that is somewhat easy to understand, and that would be 64tass and not cc65. Also, I have the feeling the code could be made cleaner with some of the directives that 64tass provides, like ending strings with zeros or having it count the characters in a string automatically.

I'm going to be crazy busy for the rest of the week, hope things will quiet down next week (in fact, if we get quarantined because of corona, it might get really quiet for a while).

scotws commented 4 years ago

@SamCoVT So I started converting the codebase to 64tass but a) realized that I was mostly just rewriting it as "Ophis in 64tass" instead of rethinking how I was doing it and b) don't know if you've already done some of this. To force myself to do things the 64tass way and to stay out of yours, I decided to start a new project from scratch, which, well ... one thing led to another, and now I have https://github.com/scotws/Cthulhu-Scheme ("it felt too good to stop, officer").

Ignore the project itself, I've been mainly doing everything that avoids actually having to face writing the lexer and parser, because that would be, well, actual work. Instead, see my suggestion for organizing the memory map is in the platform file for py65mon. I'm not sure if the usual split into data, code, and bss is that useful for us, so I've basically gone with ROM, I/O and kernel here. There is more stuff to come with the RAM, heap and so forth.

For conditional assembly, I've tried a DEBUG flag in the same file. Amazingly how much easier that makes things, we should have switched a long time ago. I've also moved some primitive debug routines to their own file. There will be some equivalent to the headers file here as well. I was originally tempted to use the 64tass structures, but then realized that that would make it hell for other people to port the code to a different assembler. I'm not sure if we want to tie ourselves that closely to 64tass?

Hope you are doing well - I've been working from home for a few weeks now, so far everybody healthy here. The trick is to keep the cats off of the keyboard.

SamCoVT commented 4 years ago

Life is a bit hectic at the moment - all of the courses I teach were thrust online so there's a lot of extra work to do to make that work for everyone. This is the first chance I've had to take a look at this issue again. Based on my work with both cc65 and 64tass, I'd like to rewrite in 64tass with the following changes to the existing structure:

No scoping. Not all of the assemblers do it the same, and some don't support it at all. Tali's not that big and I think we can make labels that are unique by prefixing with the word (or shortened word name). The last time I converted Tali, I removed some of the scopes and changed others into 64tass' version of scoping, but I ran into trouble as 64tass actively hides the lables inside, whereas Ophis still let you access them (and the code was indeed doing that for words that jump into the middle of other words). You hand mentioned not using scopes at all at some point, and I think that's the best path forward.

Use symbols to handle conditional compilation to allow the various wordsets to be removed. I think this will be a second effort after getting everything to compile under 64tass.

Having the memory map broken into ROM, I/O, and kernel makes sense to me, and I agree that doing that in the platform file is the best location.

scotws commented 4 years ago

No worries, life is weird all over. I work in shifts, and usually that gives me hours of alone time during the week - now working from home, and that time is gone - spent the last two days renovating the kitchen instead. I'm sure I'll be grateful for the time with the family in the long run, but this is not how you make progress on the exciting technological edge of 8-bit assembly.

Anyway, I think those ideas are all great. I'll get the new branch up ASAP. Oh, and officially calling it, we're going with 64tass. Thanks for all the testing!