Closed bcampbell closed 1 year ago
IIRC, @XarkLabs was pretty far along on this. I think there were some suspected codegen issues... to @XarkLabs: is there anything I can do help get this landed?
Hello. I have a good start on this, with wrappers for all kernel routines here https://github.com/XarkLabs/llvm-mos-sdk/tree/cx16_kernal_support/mos-platform/cx16 . However, working on writing more "examples" to test things I did encounter some weirdness. I am thinking having llvm-mos virtual registers overlapping the CX16 ones is perhaps causing problems (but not 100% clear, I thought I was saving/restoring). E.g., simple things like cx16_k_graph_draw_line(319, 239, 0, 239);
work, but if I put call in a loop with (e.g.) 319-x, 239-y it draws crazy stuff. Work has been busy so I have stalled a bit (but not given up). It may be I am not saving things properly for calling convention (I could see unexpected recursive calls being an issue - not clear what kernel routines will use these internally). For a test I wanted to move llvm mos virtual registers to unused ZP and see if this changes things.
Short version: I suspect the cx16_k_graph_draw_line
issue you saw might be the kernal routines blatting over rc20-rc31. The cx16 calling conventions treat most of them as scratch, while the the llvm-mos conventions mark them callee-saved.
Assuming the default linker setup where RC0 is at $02, and ignoring the real CPU registers (a,x,y etc):
r0-r5 (RC0-RC11): arguments (callee-saved or inout) r6-r10 (RC12-RC21): callee-saved r11-r15 (RC22-RC31): scratch
RS0 (RC0-RC1): callee-saved RS1-RS9 (RC2-RC19): are caller-saved (ie scratch) RS10-RS15 (RC20-RC31): callee-saved
So it looks to me like the kernal could happily stomp upon rc22-rc31 while llvm-mos expects them to be preserved, and I could definitely see that screwing up your loops...
A bit frustrating that we wouldn't know which of rc22-rc31 any given kernal call might destroy (even examining the code, future revisions might behave differently)... so maybe it's just a matter of saving them all? ugh.
It does seem like this is only an issue for the extended cx16 kernal functions, not the base c64-compatible ones, which is something.
The other thing I noticed was that the linker scripts seem to define some ZP memory from after the registers up until __basic_zp_end
:
__basic_zp_end
is defined as 0x90:
https://github.com/XarkLabs/llvm-mos-sdk/blob/cx16_kernal_support/mos-platform/cx16/link.ld#L9
But the cx16 memory map says that 0x80 onward are used by Kernal/DOS/BASIC/whatever: https://github.com/commanderx16/x16-docs/blob/master/X16%20Reference%20-%2007%20-%20Memory%20Map.md#zero-page
Which means that llvm-mos could potentially be zapping 0x80-0x90 which the cx16 roms expect to have to themselves?
Yes, all good points (and those are the llvm conventions I was assuming). I did look into ROM code a bit and didn't see any touching RC2-RC19 in unexpected places (mostly it looked only like these are input arguments - but some may nest). The ZP addresses are interesting though and probably do need to be adjusted (not sure that fully explains what I saw though, unless line draw also uses some of those). I may be okay using BASIC locations, but DOS and kernal are important. If I am still puzzled, I may post my misbehaving line draw test to Discord. Thanks.
I did look into ROM code a bit and didn't see any touching RC2-RC19 in unexpected places (mostly it looked only like these are input arguments - but some may nest).
It's RC22-RC31 which I'm concerned about. The cx16 kernal calling convention consider them to be trashable, but the llvm-mos convention expects them to be preserved.
And if I peek through the source for kernal draw_line, it does look like it's using r11, r12 and r13 (i.e. RC22-RC28):
https://github.com/commanderx16/x16-rom/blob/master/graphics/graph/graph.s#L202
So my take on it is that the kernal is trashing zeropage addresses that llvm-mos code expects to rely on...
Since the imaginary registers don't have to be contiguous, one option is to have the argument and scratch registers overlap, but give the callee-saved imaginary registers their own designated part of the zero page. This would avoid ever needing save/restore logic around kernal calls, at the expense of <=12 bytes of ZP.
Yes, that makes a lot of sense. It seems that moving things around a bit in ZP makes sense then. I also wasn't considering non-contiguous, but I'll look into that.
I'm still using my own ad-hoc cx16 kernal shims for now (pasting the few calls I need from Xarklabs tree). And I've consistently been getting odd memory-corruption heisenbugs.
Over the weekend I added a couple of nasty little functions to stash/unstash all the possibly-clobbered imaginary registers. And it did the trick! The memory-corruption bugs have completely disappeared, and the code has been rock solid ever since.
So I think that adds to the evidence that the problems are caused by the kernal clobbering imaginary registers which llvm-mos expects to be preserved.
Here's my hacky stash/unstash code, with an example shim:
regbk = $0400 ; some handy free space in the cx16 memory map
; There's an llvm-mos / cx16 calling convention mismatch.
; llvm-mos expects 16bit imaginary registers rs0 and rs10-rs15 to be preserved
; but cx16 kernal ABI says r11-r15 can be trashed (which means the kernal code
; could use these registers at will).
; Proper solution is for llvm-mos to make sure rs10-rs15 are in locations
; the cx16 kernal won't touch. But for now, we'll just stash 'em when we make
; kernal calls!
; save all the llvm-mos callee-saved imaginary registers
stashreg:
pha
; rs0
lda __rc0
sta regbk+0
lda __rc1
sta regbk+1
; rs10
lda __rc20
sta regbk+2
lda __rc21
sta regbk+3
; rs11
lda __rc22
sta regbk+4
lda __rc23
sta regbk+5
; rs12
lda __rc24
sta regbk+6
lda __rc25
sta regbk+7
; rs13
lda __rc26
sta regbk+8
lda __rc27
sta regbk+9
; rs14
lda __rc28
sta regbk+10
lda __rc29
sta regbk+11
; rs15
lda __rc30
sta regbk+12
lda __rc31
sta regbk+13
pla
rts
; restore all the llvm-mos callee-saved imaginary registers
unstashreg:
pha
; rs0
lda regbk+0
sta __rc0
lda regbk+1
sta __rc1
; rs10
lda regbk+2
sta __rc20
lda regbk+3
sta __rc21
; rs11
lda regbk+4
sta __rc22
lda regbk+5
sta __rc23
; rs12
lda regbk+6
sta __rc24
lda regbk+7
sta __rc25
; rs13
lda regbk+8
sta __rc26
lda regbk+9
sta __rc27
; rs14
lda regbk+10
sta __rc28
lda regbk+11
sta __rc29
; rs15
lda regbk+12
sta __rc30
lda regbk+13
sta __rc31
pla
rts
;
; typedef struct { int x, y; } mouse_pos_t;
; unsigned char cx16_k_mouse_get(mouse_pos_t *mouse_pos_ptr); // returns mouse button byte
; rc2/3
;
; https://github.com/X16Community/x16-docs/blob/master/X16%20Reference%20-%2004%20-%20KERNAL.md#function-name-mouse_get
;
.global cx16_k_mouse_get
cx16_k_mouse_get:
jsr stashreg
ldx #__rc4 ; x = temp pos
jsr __MOUSE_GET
tax ; save buttons
ldy #4-1 ; copy 4 byte pos to xy_ptr
copypos:
lda __rc4,y
sta (__rc2),y
dey
bpl copypos
txa ; return buttons
jsr unstashreg
rts
I don't think this is a proper solution. It'd be an unacceptable overhead on cx16 kernal calls to stash/unstash so much every time. I think the right solution is to move the conflicting llvm-mos imaginary registers to somewhere the kernal won't clobber. The non-contiguous register block idea does seem like the most efficient way - we don't care if the kernal stomps all over registers llvm-mos isn't expecting to be preserved, so those ones could remain where they are.
An alternative would be to tailor each kernal shim routine to preserve only registers which that kernal call affects. I had thought that'd mean auditing the kernal source... but I just noticed the docs seem to document which registers are affected for each call: https://github.com/X16Community/x16-docs/blob/master/X16%20Reference%20-%2004%20-%20KERNAL.md#kernal-api-functions I don't know how accurate that is. Is it likely to change much between future kernal releases?
I've made a PR with the proposed changes from this issue (reduce __basic_zp_end to 0x80 and and re-arrange imaginary registers to avoid conflict with CX16 KERNAL); however, this by itself won't close the issue, as adding the KERNAL calls themselves is still necessary :-)
The re-arranged registers PR does the trick for me. I can use my own shims, remove my nasty little (un)stashregs hacks and it a seems to run great, with no corruption bugs reappearing so far. So, thanks @asiekierka, much appreciated!
When I merge in the shims from @XarkLabs repo and rebuild llvm-mos-sdk, I can ditch my own shims entirely and it all works great. I'm only using a small handful of kernal calls, but I haven't had any problems so far. @XarkLabs: are there any more changes you want to make to your shims? Or would you be happy to submit a PR more or less as they are?
Hello, I have like ~20 more kernal call wrappers lined up for cx16, but have been holding off submitting due to some issues (that may be related to register overlap/etc.). I have just merged everything and started llvm-mos rebuild and was planning to test locally a bit, then push to my GH repo branch. I would suggest waiting just a little bit for me to see where things stand after everything merged. 🙂 BTW, I just received my "official" X16 dev board too, so I can test on real HW (eventually, when ready).
@XarkLabs , would it be possible to break out of the PR smaller bits that could go in as is? It'd be nice to get at least partial functionality into the next release or two; it's also easier to review smaller changes.
Maybe...but the issue is when regs/method changes all the stubs need to be tweaked a bit (which is why I have most of them "modified" in my local tree). As mentioned, I need to get my stuff building with updated compiler then I can see what makes sense.
Well...I merged everything in into my branch and it all builds, however, I think it is a mess. I believe I need to go over and verify/redo all the wrappers per the new X16 vs llvm register conventions. Feel free to abandon/bypass my tree as desired (and I can sync up later).
Okay, I am digging into this now. The wrappers do need going over, but not all of them. The ones that need tweaking are much improved with this new register mapping. 👍 I am working on checking and updating them all now (and hopefully finish this weekend).
Is this effectively done now, @XarkLabs ?
I believe so, but I hope to at least write a few more little example/test to make sure all the calls have at least a sanity check.
Actually, now that I have said that, mysterymath mentioned that using inline asm snippets is really the best way to call these firmware functions (in most cases). Makes sense as would likely prevent/reduce "argument shuffling" before calls (and likely save a bit of code space as well as cycles).
It sounds like the main thrust of this is completed; in such cases, it's usually better to break out more specific issues for the remaining work.
Accordingly, I've created #134 for the use of inline assembly, and I'll close this one as completed. @XarkLabs , if you like, feel free to break out another issue providing the specifics of the examples/tests you intend to write.
The cx16 target supports the standard C64 kernal calls, but there are also a whole bunch of useful cx16-specific ones. Is there any appetite for supporting shims for those calls to be directly available from C too?
It's a bit more complicated than the C64 calls - there are a bunch of more complicated functions and my (slightly sketchy) reading of things is that the cx16 kernal uses a different register convention, so a little shim is needed to move the params to the right places...
As an example, here's my shim for
memory_decompress
. I think I based it more or less on the pattern established in themos-platform/commodore)/cbm_k_*
functions. But my definition and usage of the__rcN
virtual registers is pretty dodgy! Suggestions on how to do it better would be most welcome ;-)I've got a few other cx16 functions I'd like to implement, so if anyone's interested I'd be happy to build up at least some partial coverage... (most likely mouse, joystick. keyboard fns)