Some confusion about the Calling Convention

i'm planning on making a graphics library since i recently made a VGA Card for my 65816 SBC. most functions will be in assembly for increased speed and reduced code size. so i'm trying to better understand the calling convention to know how to write the fuctions

the User Guide says:

[...] Parameters are passed in the A accumulator, index register X and pseudo registers _Dp[0-7] [...] Parameters are bound to register left to right on a first fit basis [...] Parameters that cannot be fit into registers are passedon the stack

which is pretty clear so far, but also raises the question why isn't the Y Index Register used for parameter passing as well? i've noticed that the compiler rarely seems to use the Y Register in general.

anyways, so for my setPixel function, which takes 3 parameters. 16-bit "X" and "Y" Coordiates, and an 8-bit "Color" in that order. so i assumed the parameters would be assigned like this:

X Coordinate -> Accumulator Y Coordinate -> X Index Register Color -> _Dp[0-1]

but when making a dummy function in C, calling it, and then looking at the generated ´.lst´ file, it's slightly different.

X Coordinate -> Accumulator Y Coordinate -> X Index Register (and _Dp[0-1] for some reason) Color -> Stack (as a 16-bit word)

the exact code it generated is here:

setPixel(100, 125, 250);
          lda     ##250
          pha
          ldx     ##125
          stx     dp:.tiny _Dp
          lda     ##100
          jsl     long:setPixel

this is confusing because the Y Coordiate is stored in both X and _Dp, and why would the 3rd parameter be stored to the stack when _Dp[2-3], _Dp[4-5] are all still available. (and i assume also _Dp[6-7] even though the user guide doesn't list that pseudo register, but higher up it says "_Dp[0-7]" are used as pseudo register)

overall i'd just like some clarification for how the parameter passing works exactly, cause this is just breaking my brain

on a side note, why is LDA (1,S),Y causing an invalid operand field error? it's a valid addressing mode for the 65816 that's formatted exactly like that.

The compiler only one parameter in register. XC can be used and it is useful to have at least one register free (Y in this case) as a scratch register when setting up the stack frame.

During early development I allowed two register parameters in X and D, but I ran into some nasty corner cases and ended up going for a single parameter.

In the example you show X is not a parameter, it is used as a place to load the constant to put into _Dp which is the parameter.

That the third parameter is passed on the stack and it can be argued that it is a bug, logically it should probably have used _Dp+4 for it. On the other hand, I think it is intentional. The stack is decent on the 65816 and a 16-bit value in most data models is an integer. I prefer to get pointers into the _Dp registers and it will always try to stuff two pointer parameters there, but it will leave scratchable _Dp+4 free for the called function (not put an integer value there).

If you wonder why it does not use _Dp+2 and _Dp+6 that is intentional. The register model I use regard _Dp+0- _Dp+3 as a 32 bit register, it will always put a 16 bit value in the lower part, and same goes for a 24 bit value. I want to ensure that conversions (casts) between different sized _Dp registers does not need to move the value around (which would require a CPU register). It is to make the register model simpler, I prefer it that way.

I plan to introduce an alternative simpler register model which will probably keep one CPU register and push the rest on the stack, but I have not get around to it.

I hope this clarifies some of the confusion you got from it, otherwise ask more.

The LDA (1,S),Y should work, can you share a snippet of using that, so I can see exactly how you do it? Feel free to open another ticket for it. I use that addressing mode in the compiler and the assembler has test example with it.

alright, so: first parameter goes into A (or A+X in case it's 32-bit) if it fits, second into _Dp, and everything beyond that onto the Stack. did i get that right?

I plan to introduce an alternative simpler register model which will probably keep one CPU register and push the rest on the stack, but I have not get around to it.

that would be nice actually, since then you'd avoid the constant loading from deeper in the stack to put on the _Dp before a function call. plus you'd probably be able to make good use of the PEI instruction to directly put things onto the stack without using any registers.

The LDA (1,S),Y should work, can you share a snippet of using that, so I can see exactly how you do it?

it's written exactly like that in my code, i'll post the whole file so maybe you can try to recreate the issue. the command i use is simple: AS65816 --code-model large --data-model large --list-file Output\setPixel.lst -o Temp\setPixel.o setPixel.s

; ---------------------------------------------------------------------------
; setPixel.s
; ---------------------------------------------------------------------------
;
; Draws a Pixel at the specified coordinates with "color"
;

    .rtmodel version,"1"
    .rtmodel codeModel,"large"
    .rtmodel dataModel,"large"
    .rtmodel core,"65816"
    .rtmodel huge,"0"

    .extern _Dp

; ---------------------------------------------------------------------------
; void setPixel(uint16_t PX, uint16_t PY, uint8_t color);

; PX Coordinate:    A
; PY Coordinate:    _Dp[0-1]
; Color:            Stack+4

; _Dp+0     - PY (low)
; _Dp+1     - PY (high)
; _Dp+2     - temp byte

PY  .equ _Dp
tmp .equ _Dp+2

    .section farcode, text
    .public setPixel
setPixel:
    TAY                     ; Save PX
    AND ##0x0007
    TAX                     ; Save the bottom 3 bits of PX

    TYA                     ; Get the original PX back
    LSR A
    LSR A
    LSR A
    PHA                     ; Put upper 13 bits of PX onto the Stack

    LDA dp:.tiny(PY)
    ASL A
    ASL A
    ASL A
    ASL A
    CLC
    ADC 1,S                 ; (PX >> 3) + (PY * 16)
    STA 1,S

    LDA dp:.tiny(PY)
    ASL A
    ASL A
    ASL A
    ASL A
    ASL A
    ASL A
    CLC
    ADC 1,S                 ; (PX >> 3) + (PY * 16) + (PY * 64)
    TAY                     ; Save the Calculated Address into Y

    PLA                     ; Pull the Incomplete Address from the Stack
    PHB                     ; Save the Data Bank
    PEA #0x8000             ; Push the bottom 16-bits of the VRAM Base Address
    PEA #0xFFFF             ; Push the upper 16-bits of the VRAM Base Address
    PLB
    PLB                     ; Pull Data Bank twice

    LDA ##0x0000            ; Clear A
    SEP #0b00100000         ; 8-bit Accumulator

    LDA (1,S),Y             ; Get a Byte from the calculated Address
    STA dp:.tiny(tmp)       ; And store it for now

    LDA .word0 _bitmask,X   ; Get a bitmask from the Table corresponding to the selected bit
    BEQ 1$                  ; If Color != 0
    ORA dp:.tiny(tmp)       ; Apply the Mask to the read Byte
    STA (1,S),Y             ; And put it back into VRAM
    BRA 2$

1$:                         ; If Color == 0
    EOR #0xFF               ; Invert the Mask
    AND dp:.tiny(tmp)       ; Apply the Inverted Mask to the Byte
    STA (1,S),Y             ; And put it back into VRAM

2$: REP #0b00100000         ; 16-bit Accumulator
    PLA                     ; Remove the Base Address from the Stack
    PLB                     ; Restore the Data Bank
    RTL                     ; And Return

    .section cfar, rodata
    .pubweak _bitmask
_bitmask:
    .byte   0b10000000, 0b01000000, 0b00100000, 0b00010000, 0b00001000, 0b00000100, 0b00000010, 0b00000001

alright, so: first parameter goes into A (or A+X in case it's 32-bit) if it fits, second into _Dp, and everything beyond that onto the Stack. did i get that right?

That is correct for 16-bit integer types, in all data models except the small one. Pointer types and 32-bit types will make use of _Dp+4 to _Dp+7.

I put the simplified calling convention up for the next major release, I have to see if I can get it in there, the current calling convention is too complicated on the 65816 for assembly use.

it's written exactly like that in my code, i'll post the whole file so maybe you can try to recreate the issue. the command i use is simple: AS65816 --code-model large --data-model large --list-file Output\setPixel.lst -o Temp\setPixel.o setPixel.s

This is a bug. The s is actually not parsed as a register but a literal internally and it incorrectly does not make it case insensitive. Make s lower case as a work around for now.

Version 5.1 provides a simplified calling convention which can be applied to functions. Use the __simple_call keyword or the __attribute__((simple_call)) attribute.

The bug with S register is also fixed in 5.1.

I will close this as fixed.

hth313 / Calypsi-tool-chains

Some confusion about the Calling Convention #13