Support cx16-specific kernal calls?

bcampbell commented 1 year ago

The cx16 target supports the standard C64 kernal calls, but there are also a whole bunch of useful cx16-specific ones. Is there any appetite for supporting shims for those calls to be directly available from C too?

It's a bit more complicated than the C64 calls - there are a bunch of more complicated functions and my (slightly sketchy) reading of things is that the cx16 kernal uses a different register convention, so a little shim is needed to move the params to the right places...

As an example, here's my shim for memory_decompress. I think I based it more or less on the pattern established in the mos-platform/commodore)/cbm_k_* functions. But my definition and usage of the __rcN virtual registers is pretty dodgy! Suggestions on how to do it better would be most welcome ;-)

; uint8_t* cx16_k_memory_decompress(uint8_t* src, uint8_t* dest);
;
; Kernel signature: void memory_decompress(word input: r0, inout word output: r1);
; Purpose: Decompress an LZSA2 block
; Call address: $FEED

__memory_decompress = $FEED

__rc2 = $04
__rc3 = $05
__rc4 = $06
__rc5 = $07

.global cx16_k_memory_decompress
cx16_k_memory_decompress:
;   lda mos8(__rc5) ; desthi
;   pha
;   lda mos8(__rc4) ; destlo
;   pha
;   lda mos8(__rc3) ; srchi
;   pha
;   lda mos8(__rc2) ; srclo
    lda __rc5 ; desthi
    pha
    lda __rc4 ; destlo
    pha
    lda __rc3 ; srchi
    pha
    lda __rc2   ; srclo

    sta $02 ; srclo
    pla
    sta $03 ; srchi
    pla
    sta $04 ; destlo
    pla
    sta $05 ; desthi

    jmp __memory_decompress

I've got a few other cx16 functions I'd like to implement, so if anyone's interested I'd be happy to build up at least some partial coverage... (most likely mouse, joystick. keyboard fns)

mysterymath commented 1 year ago

IIRC, @XarkLabs was pretty far along on this. I think there were some suspected codegen issues... to @XarkLabs: is there anything I can do help get this landed?

XarkLabs commented 1 year ago

Hello. I have a good start on this, with wrappers for all kernel routines here https://github.com/XarkLabs/llvm-mos-sdk/tree/cx16_kernal_support/mos-platform/cx16 . However, working on writing more "examples" to test things I did encounter some weirdness. I am thinking having llvm-mos virtual registers overlapping the CX16 ones is perhaps causing problems (but not 100% clear, I thought I was saving/restoring). E.g., simple things like cx16_k_graph_draw_line(319, 239, 0, 239); work, but if I put call in a loop with (e.g.) 319-x, 239-y it draws crazy stuff. Work has been busy so I have stalled a bit (but not given up). It may be I am not saving things properly for calling convention (I could see unexpected recursive calls being an issue - not clear what kernel routines will use these internally). For a test I wanted to move llvm mos virtual registers to unused ZP and see if this changes things.

bcampbell commented 1 year ago

Short version: I suspect the cx16_k_graph_draw_line issue you saw might be the kernal routines blatting over rc20-rc31. The cx16 calling conventions treat most of them as scratch, while the the llvm-mos conventions mark them callee-saved.

Assuming the default linker setup where RC0 is at $02, and ignoring the real CPU registers (a,x,y etc):

cx16 kernal calling conventions:

r0-r5 (RC0-RC11): arguments (callee-saved or inout) r6-r10 (RC12-RC21): callee-saved r11-r15 (RC22-RC31): scratch

llvm-mos conventions:

RS0 (RC0-RC1): callee-saved RS1-RS9 (RC2-RC19): are caller-saved (ie scratch) RS10-RS15 (RC20-RC31): callee-saved

So it looks to me like the kernal could happily stomp upon rc22-rc31 while llvm-mos expects them to be preserved, and I could definitely see that screwing up your loops...

A bit frustrating that we wouldn't know which of rc22-rc31 any given kernal call might destroy (even examining the code, future revisions might behave differently)... so maybe it's just a matter of saving them all? ugh.

It does seem like this is only an issue for the extended cx16 kernal functions, not the base c64-compatible ones, which is something.

bcampbell commented 1 year ago

The other thing I noticed was that the linker scripts seem to define some ZP memory from after the registers up until __basic_zp_end:

https://github.com/XarkLabs/llvm-mos-sdk/blob/cx16_kernal_support/mos-platform/commodore/commodore.ld#L13

__basic_zp_end is defined as 0x90: https://github.com/XarkLabs/llvm-mos-sdk/blob/cx16_kernal_support/mos-platform/cx16/link.ld#L9

But the cx16 memory map says that 0x80 onward are used by Kernal/DOS/BASIC/whatever: https://github.com/commanderx16/x16-docs/blob/master/X16%20Reference%20-%2007%20-%20Memory%20Map.md#zero-page

Which means that llvm-mos could potentially be zapping 0x80-0x90 which the cx16 roms expect to have to themselves?

XarkLabs commented 1 year ago

Yes, all good points (and those are the llvm conventions I was assuming). I did look into ROM code a bit and didn't see any touching RC2-RC19 in unexpected places (mostly it looked only like these are input arguments - but some may nest). The ZP addresses are interesting though and probably do need to be adjusted (not sure that fully explains what I saw though, unless line draw also uses some of those). I may be okay using BASIC locations, but DOS and kernal are important. If I am still puzzled, I may post my misbehaving line draw test to Discord. Thanks.

bcampbell commented 1 year ago

I did look into ROM code a bit and didn't see any touching RC2-RC19 in unexpected places (mostly it looked only like these are input arguments - but some may nest).

It's RC22-RC31 which I'm concerned about. The cx16 kernal calling convention consider them to be trashable, but the llvm-mos convention expects them to be preserved.

And if I peek through the source for kernal draw_line, it does look like it's using r11, r12 and r13 (i.e. RC22-RC28):

https://github.com/commanderx16/x16-rom/blob/master/graphics/graph/graph.s#L202

So my take on it is that the kernal is trashing zeropage addresses that llvm-mos code expects to rely on...

mysterymath commented 1 year ago

Since the imaginary registers don't have to be contiguous, one option is to have the argument and scratch registers overlap, but give the callee-saved imaginary registers their own designated part of the zero page. This would avoid ever needing save/restore logic around kernal calls, at the expense of <=12 bytes of ZP.

XarkLabs commented 1 year ago

Yes, that makes a lot of sense. It seems that moving things around a bit in ZP makes sense then. I also wasn't considering non-contiguous, but I'll look into that.

bcampbell commented 1 year ago

I'm still using my own ad-hoc cx16 kernal shims for now (pasting the few calls I need from Xarklabs tree). And I've consistently been getting odd memory-corruption heisenbugs.

Over the weekend I added a couple of nasty little functions to stash/unstash all the possibly-clobbered imaginary registers. And it did the trick! The memory-corruption bugs have completely disappeared, and the code has been rock solid ever since.

So I think that adds to the evidence that the problems are caused by the kernal clobbering imaginary registers which llvm-mos expects to be preserved.

Here's my hacky stash/unstash code, with an example shim:

regbk = $0400   ; some handy free space in the cx16 memory map

; There's an llvm-mos / cx16 calling convention mismatch.
; llvm-mos expects 16bit imaginary registers rs0 and rs10-rs15 to be preserved
; but cx16 kernal ABI says r11-r15 can be trashed (which means the kernal code
; could use these registers at will).
; Proper solution is for llvm-mos to make sure rs10-rs15 are in locations
; the cx16 kernal won't touch. But for now, we'll just stash 'em when we make
; kernal calls! 

; save all the llvm-mos callee-saved imaginary registers
stashreg:
    pha
    ; rs0
    lda __rc0
    sta regbk+0
    lda __rc1
    sta regbk+1
    ; rs10
    lda __rc20
    sta regbk+2
    lda __rc21
    sta regbk+3
    ; rs11
    lda __rc22
    sta regbk+4
    lda __rc23
    sta regbk+5
    ; rs12
    lda __rc24
    sta regbk+6
    lda __rc25
    sta regbk+7
    ; rs13
    lda __rc26
    sta regbk+8
    lda __rc27
    sta regbk+9
    ; rs14
    lda __rc28
    sta regbk+10
    lda __rc29
    sta regbk+11
    ; rs15
    lda __rc30
    sta regbk+12
    lda __rc31
    sta regbk+13

    pla
    rts

; restore all the llvm-mos callee-saved imaginary registers
unstashreg:
    pha
    ; rs0
    lda regbk+0
    sta __rc0
    lda regbk+1
    sta __rc1
    ; rs10
    lda regbk+2
    sta __rc20
    lda regbk+3
    sta __rc21
    ; rs11
    lda regbk+4
    sta __rc22
    lda regbk+5
    sta __rc23
    ; rs12
    lda regbk+6
    sta __rc24
    lda regbk+7
    sta __rc25
    ; rs13
    lda regbk+8
    sta __rc26
    lda regbk+9
    sta __rc27
    ; rs14
    lda regbk+10
    sta __rc28
    lda regbk+11
    sta __rc29
    ; rs15
    lda regbk+12
    sta __rc30
    lda regbk+13
    sta __rc31

    pla
    rts

;
; typedef struct { int x, y; } mouse_pos_t;
; unsigned char cx16_k_mouse_get(mouse_pos_t *mouse_pos_ptr);   // returns mouse button byte
;                                             rc2/3
;
; https://github.com/X16Community/x16-docs/blob/master/X16%20Reference%20-%2004%20-%20KERNAL.md#function-name-mouse_get
;
.global cx16_k_mouse_get
cx16_k_mouse_get:
    jsr stashreg
    ldx #__rc4      ; x = temp pos
    jsr __MOUSE_GET
    tax         ; save buttons
    ldy #4-1        ; copy 4 byte pos to xy_ptr
copypos:
    lda __rc4,y
    sta (__rc2),y
    dey
    bpl copypos
    txa         ; return buttons
    jsr unstashreg
    rts

I don't think this is a proper solution. It'd be an unacceptable overhead on cx16 kernal calls to stash/unstash so much every time. I think the right solution is to move the conflicting llvm-mos imaginary registers to somewhere the kernal won't clobber. The non-contiguous register block idea does seem like the most efficient way - we don't care if the kernal stomps all over registers llvm-mos isn't expecting to be preserved, so those ones could remain where they are.

An alternative would be to tailor each kernal shim routine to preserve only registers which that kernal call affects. I had thought that'd mean auditing the kernal source... but I just noticed the docs seem to document which registers are affected for each call: https://github.com/X16Community/x16-docs/blob/master/X16%20Reference%20-%2004%20-%20KERNAL.md#kernal-api-functions I don't know how accurate that is. Is it likely to change much between future kernal releases?

asiekierka commented 1 year ago

I've made a PR with the proposed changes from this issue (reduce __basic_zp_end to 0x80 and and re-arrange imaginary registers to avoid conflict with CX16 KERNAL); however, this by itself won't close the issue, as adding the KERNAL calls themselves is still necessary :-)

bcampbell commented 1 year ago

The re-arranged registers PR does the trick for me. I can use my own shims, remove my nasty little (un)stashregs hacks and it a seems to run great, with no corruption bugs reappearing so far. So, thanks @asiekierka, much appreciated!

When I merge in the shims from @XarkLabs repo and rebuild llvm-mos-sdk, I can ditch my own shims entirely and it all works great. I'm only using a small handful of kernal calls, but I haven't had any problems so far. @XarkLabs: are there any more changes you want to make to your shims? Or would you be happy to submit a PR more or less as they are?

XarkLabs commented 1 year ago

Hello, I have like ~20 more kernal call wrappers lined up for cx16, but have been holding off submitting due to some issues (that may be related to register overlap/etc.). I have just merged everything and started llvm-mos rebuild and was planning to test locally a bit, then push to my GH repo branch. I would suggest waiting just a little bit for me to see where things stand after everything merged. 🙂 BTW, I just received my "official" X16 dev board too, so I can test on real HW (eventually, when ready).

mysterymath commented 1 year ago

@XarkLabs , would it be possible to break out of the PR smaller bits that could go in as is? It'd be nice to get at least partial functionality into the next release or two; it's also easier to review smaller changes.

XarkLabs commented 1 year ago

Maybe...but the issue is when regs/method changes all the stubs need to be tweaked a bit (which is why I have most of them "modified" in my local tree). As mentioned, I need to get my stuff building with updated compiler then I can see what makes sense.

XarkLabs commented 1 year ago

Well...I merged everything in into my branch and it all builds, however, I think it is a mess. I believe I need to go over and verify/redo all the wrappers per the new X16 vs llvm register conventions. Feel free to abandon/bypass my tree as desired (and I can sync up later).

XarkLabs commented 1 year ago

Okay, I am digging into this now. The wrappers do need going over, but not all of them. The ones that need tweaking are much improved with this new register mapping. 👍 I am working on checking and updating them all now (and hopefully finish this weekend).

asiekierka commented 1 year ago

Is this effectively done now, @XarkLabs ?

XarkLabs commented 1 year ago

I believe so, but I hope to at least write a few more little example/test to make sure all the calls have at least a sanity check.

XarkLabs commented 1 year ago

Actually, now that I have said that, mysterymath mentioned that using inline asm snippets is really the best way to call these firmware functions (in most cases). Makes sense as would likely prevent/reduce "argument shuffling" before calls (and likely save a bit of code space as well as cycles).

mysterymath commented 1 year ago

It sounds like the main thrust of this is completed; in such cases, it's usually better to break out more specific issues for the remaining work.

Accordingly, I've created #134 for the use of inline assembly, and I'll close this one as completed. @XarkLabs , if you like, feel free to break out another issue providing the specifics of the examples/tests you intend to write.

llvm-mos / llvm-mos-sdk

Support cx16-specific kernal calls? #113

cx16 kernal calling conventions:

llvm-mos conventions: