nurpax / c64jasm

C64 6502 assembler in TypeScript
51 stars 14 forks source link

support for segments #60

Open neochrome opened 5 years ago

neochrome commented 5 years ago

I was trying to organize my code into multiple files, each with their own responsibility and run into some issues with the inclusion order into the main program. If sub1.asm defines some subroutines and loads some data, and then sub2.asm does that too, the final order of things would be something like sub1 code, sub1 data, sub2 code, sub2 data, which may not always be desired. Instead one might want to be able to consolidate and keep code together and then data together. Would something like this be worthwhile to implement?

Maybe it could work a bit like KickAssembler .segment/.segmentdef but simpler?

neochrome commented 5 years ago

Example:

!segmentdef ZP {from: $02, to: $ff}
!segmentdef CODE {from: $0800, to: $0fff}
! segmentdef DATA {from $1000, to: $1fff}
...
; sub1.asm
!segment CODE
lda #0

!segment DATA
music: !byte sid_data
nurpax commented 5 years ago

Totally in 💪 of this feature. Been on my list for some time.

The zp part is easy to do with just variables right now but all the others make a lot of sense.

I like the object literal syntax for segment defs. :) Perhaps it could use the same keyword for both declaration and use. !segment with no from/to args could just start a new segment at the current pc.

neochrome commented 5 years ago

Totally in :muscle: of this feature. Been on my list for some time.

Cool! :sunglasses:

I like the object literal syntax for segment defs. :) Perhaps it could use the same keyword for both declaration and use. !segment with no from/to args could just start a new segment at the current pc.

Do you mean something like this?

!segment CODE { from: $0800, to: $0fff }
!segment DATA { from: $1000, to: $1fff }

; somewhere else, maybe in another .asm file
!segment CODE
a_sub_routine:
  lda some_data
  ...
  rts

!segment DATA
some_data: !byte 0,1,2,3

I think it looks intuitive enough, using the same keyword both for declaration and use.

Another possibility could perhaps be to allow specifying a previously defined segment when declaring a scope for a label? Like this:

!segment CODE { from: $0800, to: $0fff }
!segment DATA { from: $1000, to: $1fff }

; somewhere else, maybe in another .asm file
a_sub_routine: CODE {
  lda some_data
  ...
  rts
}

some_data: DATA  {
  !byte 0,1,2,3
}

However I don't know how that would work with filescopes...and also it might not be as clear as using a specific keyword?

nurpax commented 5 years ago

@neochrome I guess in your idea of segments, the segments would always output whatever they contain into the output PRG?

That doesn't seem to be the case in KickAssembler. Only the default segment goes into the output PRG by default. Any other segment needs to be explicitly written to an output file (or merged into the default segment).

It's not a completely bad idea to support more general segments like in KA. This enables cartridge builds, multi-part demo builds, etc. Just need to think about implementation carefully.

BTW agree that a specific keyword is better than somehow adding special label syntax for segments.

neochrome commented 5 years ago

@nurpax I believe your right about KA's default behavior and also, my thoughts was to have a slightly different default, where the output ends up in the same file. I think that is a more sane approach, and one that could be expanded upon to allow output configurations, like in KA if wanted, by adding more configuration options.

nurpax commented 4 years ago

@neochrome BTW, I didn't give up on this. I've just been on a bit of a coding break, playing Zelda: Link's awakening and learning Rust lang. Will certainly work on this at some point.

The easiest implementation would be where all segments are declared with a fixed starting address (and maybe size). But KA supports segments where you can say segment B starts after segment A. It's of course possible to support this, but it will automatically mean more compilation passes to work out the starting address for those "start from" segments.

neochrome commented 4 years ago

No worries at all! I've kept busy toying around with a dsl-like solution in ruby for constructing 6502 machine language just to try out some stuff :)

The easiest implementation would be where all segments are declared with a fixed starting address (and maybe size).

I think the easier route is the way to go on this - both to get something going, but also because it's probably not that hard to choose segment range up front anyway...

shazz commented 3 years ago

I wake up this issue as it would be a great improvement as complex application requiring detailed and fine memory organization is hard to design without segments.

nurpax commented 3 years ago

@neochrome BTW, I didn't give up on this. I've just been on a bit of a coding break, playing Zelda: Link's awakening and learning Rust lang. Will certainly work on this at some point.

Said the author in September 2019. :) But I'm starting to pull this into my cache again, hopefully with better results this time.

nurpax commented 3 years ago

I actually went ahead and implemented segment support. If there's anyone still around in this GitHub issue that cares, I could post my design up here for a quick review..

Currently it looks like below. The syntax is kind of arbitrary, just what felt ok when I started looking at the parser. start/end arguments can take on any expressions, so you could even load the values from a JSON file if you wanted to somehow externally configure memory layout.

But most likely the expressions used for start/end must be values that do not depend on label values. Because I think that will probably make the multi-pass forward reference label address resolver never converge. Or so it feels like, didn't think it through.

!segment code(start=$810, end=$820)
!segment data(start=$830, end=$840)

* = $801
    lda #0

!segment code ; use code segment
    lda #1    ; should be at address $810
    lda #2

!segment data
!byte 0,1,2,3 ; data, should go into address $830

!segment code ; emit to code segment
    lda #3
    lda #4 

This yields the following disassembly:

0801: A9 00        LDA #$00
0803: 00           BRK
0804: 00           BRK
0805: 00           BRK
0806: 00           BRK
0807: 00           BRK
0808: 00           BRK
0809: 00           BRK
080A: 00           BRK
080B: 00           BRK
080C: 00           BRK
080D: 00           BRK
080E: 00           BRK
080F: 00           BRK
0810: A9 01        LDA #$01
0812: A9 02        LDA #$02
0814: A9 03        LDA #$03
0816: A9 04        LDA #$04
0818: 00           BRK
0819: 00           BRK
081A: 00           BRK
081B: 00           BRK
081C: 00           BRK
081D: 00           BRK
081E: 00           BRK
081F: 00           BRK
0820: 00           BRK
0821: 00           BRK
0822: 00           BRK
0823: 00           BRK
0824: 00           BRK
0825: 00           BRK
0826: 00           BRK
0827: 00           BRK
0828: 00           BRK
0829: 00           BRK
082A: 00           BRK
082B: 00           BRK
082C: 00           BRK
082D: 00           BRK
082E: 00           BRK
082F: 00           BRK
0830: 00           BRK
0831: 01 02        ORA ($02,X)
0833: 03
neochrome commented 3 years ago

Even if I'm not currently developing for the ol bread box, I'd be happy to take a look and compare notes :)

On Sat, Jan 30, 2021, 02:46 Janne Hellsten notifications@github.com wrote:

I actually went ahead and implemented segment support. If there's anyone still around in this GitHub issue that cares, I could post my design up here for a quick review..

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nurpax/c64jasm/issues/60#issuecomment-770134197, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACR6UXWPD24YW23AW6J2PLS4NQGTANCNFSM4ISTTO4A .

shazz commented 3 years ago

I like the fact you can split a segment and continue to append code to it :D Even if I have no clue what could be the usage.

In terms of syntax and behavior, I'm pretty pleased with the proposal. To possibly complete it, here is my practical use-cases, I'll try to explain them without taking too many technical assumptions.

Usecases / Goals

Goal 1: avoid the Tetris game (or should I say Bintris ???) when coding a one-file demo which trying to maximize memory usage while respecting C64 memory constraints (banks, alignment, screen mem, i/o,...)

Goal 2: simplify the design of a multi-parts demo using OMG's Sparkle

Usecase 1 details

Let's start with the most obvious one (Goal 1).

  1. The story is simple, I'm starting to code a little demo effect without taking care too much of where I set the code, the sprites, the music...
  2. Then I add another effect I already coded... and ah it is not that easy, I set some stuff at the same memory location (typically gfx data) or they overlap or they don't all fit in the same d018 setup.
  3. And this is the beginning of the Tetris game, I move parts of the code, set arbitrary * = SPRITE_DATA, change SPRITE_DATA to make it fit... and c64jasm starts to complain that one *= cannot be before another and so on.
  4. So I need to cut/paste the code up and down and start again.

Using segments, I'm quite sure the Tetris game will be easier, at least, no need to physically move part of the listing up and down.

Usecase 2 details

Now, the second goal which may also have an impact on the PRG generation.

In brief, Sparkle is an IRQ Loader which takes care of building a disk image and depacking/loading into data in memory (code or other) when requested. Sparkle doesn't require to change the code or adapt it, is a a second pass process.

Using the Sparkle (Windows only :( GUI) or manually), you only have to define a script which will design your multi-part application. here is a simple example I built recently based on 2 demo effects I built with c64jasm

[Sparkle Loader Script]

Path:   trsi
Header: CSDB Compo
ID: trsi
Name:   SpritesOnly
Start:  1e00
DirArt: dirart.txt
IL0:    05
IL1:    03
IL2:    03
IL3:    03
ZP: 10
Loop:   0

Script: sequencer\sequencer.sls

File:   bigsprite\data\skull5.bin   2800
File:   bigsprite\bin\bigsprite.prg 0834    0035    076c
File:   bigsprite\bin\bigsprite.prg 4000    3801    014f
File:   bigsprite\bin\bigsprite.prg 7000    6801    007f

File:   multiplexer\data\cubes.bin  2000
File:   multiplexer\bin\multiplexer.prg 0951    0152    013d
File:   multiplexer\bin\multiplexer.prg c500    bd01    07d7

Without going into all the details of a sls script, let's have a look at how the 2 demo parts are defined .

Part I: BigSprite

this part consists of:

Part 2: Multiplexer

this part consists of:

Fortunately, Sparkle can manage offsets when the cross-assembler don't manage segment assembling and will extract from any file (in my case the prg which aggregates the various segments) the required slice. This is the meaning of the 2 additional parameters:

File:   multiplexer\bin\multiplexer.prg 0951    0152    013d

=> From multiplexer.prg, extracts data starting at offset 0x152 for 0x13d bytes and loads it at $0951

So using a python script and the labels file generated from c64jasm, I could automatically generate those offset/size parameters for each segment. That works but that would be soooo better if c64jasm could generate one prg (as Sparkle can use the 2 first bytes to get the start address) or any binary file for each segment.

I hope my 2 use-cases make sense, fell free t comment if not clear or anything.

Comments

Last point, not (yet) a requirement but something I found interesting while trying CC65 relocatable segments linker is that it gave me the possibility to split my code in different files (and avoid the 10km code in one file) and using the .export/.import directives to defines global labels. Then the linker automatically resolves the labels.

shazz commented 3 years ago
!segment code(start=$810, end=$820)
!segment data(start=$830, end=$840)

As the end looks mandatory (I was wondering if it makes sense of not. Not totally sure yet but I start to think this is better to fix it even if I don't always know yet how big my code will be in this segment), will the assembler tell for each segment how much is used / left ?

* = $801
lda #0

Should the entry point be defined as an org or cold it be a parameter or !segment ?

nurpax commented 3 years ago

end is currently mandatory but it's pretty easy to work out during assembly too. It might work to make it optional and just check that the segment doesn't grow over some other segments.

The start of the segment is tricker to figure out. For example something like this:

!segment code () ; neither start or end specified
!segment data () ; data implicitly starts after code

Figuring out the start of data may lead to some sort of multipass explosion. I'm not entirely sure :)

nurpax commented 3 years ago

@shazz how do you feel about the keyword arg syntax?

I realize that it is a bit inconsistent with older features like !binary:

!binary "file1.bin"       ; all of file1.bin
!binary "file2.bin",256   ; first 256 bytes of file
!binary "file2.bin",256,8 ; 256 bytes from offset 8

I've been coding so much Python lately that this

!segment code(start=$810, end=$820)
!segment data(start=$830, end=$840)

felt like the obvious first choice. But @neochrome above used JS object notation. In some sense that's more in the spirit of c64jasm and JS..

Feels like this keyword syntax might in fact be something that could be retrofitted for things like !binary, like:

!binary ("file", length=256, offset=4)

@shazz another q: re multiple output files. So something like this:

!segment code(start=$810, end=$1000)
!segment code_b(output="b.bin", start=$810)

With !segment, output would default to whats specified on the command line with --out file.

I think overlapping segment address ranges would be forbidden. But probably only if the overlap happens within a single output?

shazz commented 3 years ago

About !binary

I really prefer the keyword syntax, I always thought you kept the !binary "file2.bin",256,8 syntax to look like old cross-assemblers. I don't really like it, I never remember what 8 and 256 means, in which order. So if you have the time and motivation for a spring cleaning, that would make c64jasm more consistent (and let's forget the old bad habits)

Segments

About the segment porposal, I like the optional output param to generate the segment prg/bin.

Dict vs Function notation

About dict notation vs function notation... I would say I prefer the function notation when it is an action (like !binary) and the dict notation when it is a configuration. So for the segments.... if you can get rid of the top definition and just do:

!segment code(start=$810, end=$820)
    lda #1    ; should be at address $810
    lda #2

!segment data(start=$830, end=$840)
!byte 0,1,2,3 ; data, should go into address $830

It looks fine to me but the segment split/append becomes more complicated (even if I don't think I will use it). If you prefer the top segment definition, yeah, maybe dict notation looks more natural to me.

In my python meta-cross-assembler, here how I define a segment, using context manager.

with segment(0x0801, "CODE") as s:
   ...

But really, dict or function, both are ok for me. Your call.

nurpax commented 3 years ago

I like the function syntax better too.

Re def vs. use. Kickassembler made a distinction between declaring a segment and using it. I also definitely see use for being able to be alternating between segments (like my example code -> data -> code), even though I guess you didn't quite see the point. Neochrome's first comment indicates that he was specifically looking for this type of alternating segment support, and I've had a need for this myself.

I thought the !segment (...) would always be a declaration, and later !segment <name> would be a use. KickAssembler has segmentdef and segment for these but I'd prefer just a single keyword.

Alternatively it could be that the first occurrence of something like

!segment code(start=$810, end=$820)

both defines a new segment, and marks it active. So it'd be equivalent to:

!segment code(start=$810, end=$820)
!segment code

Maybe? OTOH, this will have the problem that if someone does this:

!segment code(start=$810, end=$820)
!segment data(start=$1000)

; ok time to code
main: lda #0

their code would go into the data segment..

shazz commented 3 years ago

I think is first idea is fine, !segment (...) to define, segment <name> to use. Less confusion.

shazz commented 3 years ago

Question raised under the shower this morning. Segments are also useful to split huge codebase (particularly unmanageable in assembly) into small chunks. Will it be possible to include the segments definition in each segment file ?

shazz commented 3 years ago

If the segments are split in multiple files, how the cli will look like ?

Something like that?

c64jasm --disasm-file hello_world.lst --labels-file hello_world.labels --out hello_world.prg START.asm CHARSET.asm
nurpax commented 3 years ago

Good question. I actually always thought the segments would still be included through some common .asm file. But if multiple files on the command line feels good (I haven't decided), this could be treated as the same as including all of the listed files in a single (implicit) asm file.

This brings out another questions which is: should there be some sort of --outdir option too? If compilation can output multiple .prg and .bin files, it'd be nice if they wouldn't get saved under the source dir.

nurpax commented 3 years ago

Sorry @neochrome and @shazz, the below update is a bit long. I tried to summarize the current design in case you have some suggestions on how to improve it. It feels pretty good to me so far.

I checked in some work on segments.. https://github.com/nurpax/c64jasm/commit/ea95e02f460ec22560c8b404c5b6412c0cdc1384

It still has some bugs and missing features:

Allowable parameters for the segment start argument

The start argument expression value must be constrained to only accept values that can be resolved in the compilation pass. I think there are cases otherwise that will cause the multipass compiler to never converge. Something like this:

!segment code(start=foo)

  jmp foo
!segment code
  lda #1

foo:
  lda #0

I can't quite wrap my head around what should even happen in this case.

But the start/end expressions would be otherwise just normal expressions. So you could even read their values from say a JSON file. Something like this would be allowed:

  lda #0
foo:
  lda #1

!segment code(start=foo+2)
!segment code
  lda #3

This would generate:

  lda #0
  lda #1
  lda #2

A start=foo would generate an error as it'd cause code to overlap with the default segment.

Default segment and default output

By default everything goes into an implicit "default" segment that gets saved into the output prg specified on the command line with --out. (Very much like in KickAssembler).

Segments without any output declarations will be saved in the same default output too. (Unlike in KickAssembler that just throws away segments that do not specify an output.)

E.,g

!segment code(start=$1000)
  * = $801
  lda #0
  sta $d020

!segment code
  lda #0  ; this will go to address $1000

The output will contain binary from $801 (start of default segment) up to $1002 (end of code segment).

Multiple outputs

The plan is to add an argument like out in !segment code(out="foo.prg", start=$1000) that means anything going into that segment will not be saved to the "default" output specified by --out but into foo.prg.

CLI must be extended with --outdir flag so that the above foo.prg can be written to some build dir instead of the current directory.

However, I'm not a big fan of sticking filenames in source files. So I'd at least like a command line override that you could use to say c64jasm --segment code.out="bar.prg" to override. This makes build scripting more flexible.

Scoping rules

Scoping for segments. Right now they follow the same scoping rules as everything else, including relative scope references and nested scope names. But does this make much sense? E.g.,

!if (foo) {
!segment code(start=$1000) ; in anonymous scope
} else {
!segment code(start=$2000)
}
!segment code ; NOPE, code was declared in an anonymous scope above, so this fails

Of course you could write the above like this:

!let s = $1000
!if (!foo) {
!! s = $2000
}
!segment code(start=s)

Similarly you could now do something like this:

file_x.asm:

!filescope foo
!segment code(start=$1000)

main.asm:

!include "file_x.asm"

!segment foo::code  ; switch to segment defined in file_x.asm

Not sure if this is useful or even desired. Maybe it is?

neochrome commented 3 years ago

Sorry for the late feedback - been quite busy with work etc. First, I like the proposed syntax, very clear how to define the start/end of segments and how to switch which is active. I guess some kind of error could be had if one puts more than what fits in a segment?

I think it would be good enough (at least in a first version) to have to specify separate segment output(s) on the CLI if need be. Not sure how useful it would be to be able to specify overlapping segments (as long as they go to different outputs) - might be allowed by KA, but again not sure of a good use case. Another way of catering for that might be to allow the same segment to be outputted to multiple different files by specifying multiple outputs on the CLI and have them en up in address order on file (of course).

With regards to scoping, I think keeping it simple and predictable (for the user) is important. For me, IIRC, the main purpose was to be able to define segments in the main file, and referring to them from other files in order to put stuff in place in an organized way :)

I played around a bit with segment defs in my little Ruby DSL for generating 6502 binary code - I'll have a look at how I tackled some of these cases there and see if I find something more.

nurpax commented 3 years ago

Thanks for the comments!

Not sure how useful it would be to be able to specify overlapping segments (as long as they go to different outputs) - might be allowed by KA, but again not sure of a good use case.

One use-case would be if someone wants to build multiple .prg files (say a multipart demo) with a single command line invocation. Might be handy if you just want to kickoff a c64jasm --watch src for the whole demo project.

With regards to scoping, I think keeping it simple and predictable (for the user) is important.

I think the current implementation is fine now. It follows the same scoping as variables and other symbols.

Apart from some minor error checking and multiple prg output, the feature is pretty much done.

shazz commented 3 years ago

I agree with @neochrome.

In details:

Comments:

But overall looks perfect to me I don't need much more than @neochrome (define/split/organize) and segment outputs for Sparkle as I can tell right now :)

@neochrome, funny, I did the same but not in Ruby, in python :)

shazz commented 3 years ago

Btw, building the branch generates some warning:

added 367 packages, and audited 434 packages in 6s

14 vulnerabilities (12 low, 2 high)

Is it.. important ?

nurpax commented 3 years ago

Thanks again!

if segment outputs are specifiied, it won't prevent to build the full PRG right ? At least to debug :)

Can you expand on this? I don't understand.

segment overlapping check will be good to have, but if it happens what will be the result ?

Default behavior would be to treat this as an error if segments going to the same output file overlap. Overlapping segments with a different destination would be fine (otherwise you couldn't really build multiple prg outputs).

nurpax commented 3 years ago

Is it.. important ?

I've been conditioned to ignore these due to GitHub's dependabot spam that I've been getting for the past 1-2 years now. Should clean those up at some point.

shazz commented 3 years ago

for outputs I meant, the assembling process will: