z88dk / z88dk

The development kit for over a hundred z80 family machines - c compiler, assembler, linker, libraries.
https://www.z88dk.org
Other
922 stars 175 forks source link

(classic) Complete gameboy support #1287

Closed suborb closed 3 years ago

suborb commented 5 years ago

The following need to be completed for a full target:

The following should be done:

suborb commented 5 years ago

Re banking. It looks like sdcc supports #pragma bank NN which sets the code segment to _CODE_NN, leaves bss, data, rodata sections as usual.

Within z88dk we use BANK_NN as the section name for the SMS target and I've copied that over to the Gameboy target. These are targeted with the appropriate pragma to change the section names. This suggests that sccz80 should support #pragma bank as a shortcut for setting code and rodata sections.

I think placing all z88dk library code within the always paged in bank makes life easier so we'll only end up with user code being manually placed within banks.

Conventionally, GBDK used the __banked annotation to support banked calls, this resulted in the following code being emitted:

call banked_call
defw [function address]
defw [bank]

Where banked_call switches pages and offsets the stack, this stack offset can be handled by zsdcc and sccz80 with the annotation __z88dk_params_offset(X).

Function address can be populated using a regular z80asm patch expression. The bank could be populated by an appmake stage as follows:

  1. Lookup the address of banked_call in the .map file: XXYY
  2. Search the binary for sequences of: CDYYXX [AA BB] [00 00] where BBAA will be the address of a function.
  3. Reverse search for BBAA in the .map file, find the section name, parse and deduce the bank.

This does, however feel fragile - I'd be worried about functions having clashing addresses in different banks (which will happen for functions at the start of a bank unless we fudge the first usable address within a bank).

Thus, I think we will need z80asm support for this - effectively a way of parsing the section name of symbol and allowing that to be used within source code. Thoughts, @pauloscustodio ?

feilipu commented 5 years ago

Re Banking.

This matches the way that it is done on YAZ180 with the __call_far function. Though I use call, defw, defb (signed relative), for the bank call.

Using a signed bank definition allows easy relative bank calls (i.e. call a bank above, or a bank below the current bank), and zero refers to the current bank. But this is not baked in concrete. Happy to align to whatever becomes the standard way to do this.

Would it be sensible to make the addressing somewhat linear with a defq address definition, where the lower 16 bits are addressing and the upper 16 are either bank identifier or linear address space, depending on the platform implementation?

The patching mechanism for banks looks quite like the REL format, with the bitmap attached to indicate items (call, jp, etc) to be patched.

pauloscustodio commented 5 years ago

Re Banking.

Two options below. Please comment or suggest alternatives.

1) Extend z80asm to handle 24- or 32-bit addresses, where the lower 16-bits are the address seen by the CPU, and the upper 8- or 16-bits are the bank id (platform dependent, e.g. value to be written to the bank register). CALL, JP, ... need to accept the 32-bit address and ignore the upper 16-bits. For the Spectrum 128K (which I know better than the GameBoy), one could write

  section xxx1      ; name is not relevant
  org $00C000       ; select page 0 at address $C000

  public func1
  func1: ...

  section xxx2
  org $01C000        ; select page 1 at address $C000

  extern func1
  ...
  call banked_call   ; platform-dependent function that switches banks and calls the function
  defp func1         ; patch in a 24-bit address (Spectrum 128k); use defq for a 32-bit address

This solution is easy to implement in z80asm; all the infrastructure is in place, we just need to handle the 32-bit addresses gracefully.

2) Let the linker automatically resolve banked calls.

Same as above, use the upper 16-bit of a 32-bit address as the bank id. Create new opcodes for banked calls that reserve space in the object code for the call to the banked_call function (platform dependent function, part of library) and the 24- or 32-bit address (call24, call32). At link time, include either the call to banked_call and a defp/defq with the called address, or, if the target is in the same bank, a regular call followed by 3 or 4 nop.

  section xxx1      ; name is not relevant
  org $00C000       ; select page 0 at address $C000

  public func1
  func1: ...

  section xxx2
  org $01C000        ; select page 1 at address $C000

  extern func1

  func2: ...
  ...
  call24 func1       ; assembled & linked as: call banked_call : defp func1
  call24 func2       ; assembled & linked as: call func2 : nop : nop : nop

This solution is a bit more complex, but gives additional flexibility in arranging the code in banks.

suborb commented 5 years ago

I quite like option 1 since it offers the compiler (or assembler author) more control as to how to invoke a function - I do suspect there may well be many ways of actually invoking the trampoline (for example via a rst is going to be an obvious option) so the pseudo op-code doesn't feel quite right.

The automatic conversion in this case:

call24 func2       ; assembled & linked as: call func2 : nop : nop : nop

would be incorrect if func2 was a C function with parameters (the parameters wouldn't be at the expected stack offset).

pauloscustodio commented 5 years ago

Option 1 implemented in z80asm_32bit_addresses branch. Please let me know of any issues,

suborb commented 5 years ago

Thank you Paulo, I'll give it a try this evening I hope.

suborb commented 5 years ago

The z80asm changes works for my requirements - I've successfully had a banked call execute and return a value.

I don't think losing the range checking is too much bother so please feel free to merge.

basxto commented 4 years ago

It looks like sdcc supports #pragma bank NN which sets the code segment to _CODE_NN

That’s correct -bo also generates _CODE_N and -ba generates _DATA_N (SRAM banks)

Inside the linker _CODE_N becomes 0xN4000 and _DATA_N becomes 0xNA000, which get mapped to the real rom addresses when the IHX gets created.

Adding bank and address after the call became --legacy-banking in SDCC trunk.

EDIT:

Fixup return values to match SDCC abi

What does that mean? gbdk-n followed the return in e, de, hlde SDCC uses on gbz80

suborb commented 4 years ago

Fixup return values to match SDCC abi

What does that mean? gbdk-n followed the return in e, de, hlde SDCC uses on gbz80

The z88dk libraries and the z80 targets use l, hl, dehl so for the libraries to work with zsdcc (gbz80) 8/16 bit functions have the return value in de and hl.

basxto commented 4 years ago

Does that also mean that you have __z88dk_fastcall on gbz80 now?

suborb commented 4 years ago

We haven’t changed anything in sdcc regarding gbz80. However sccz80 supports fastcall and callee functions for gbz80

basxto commented 4 years ago

Well, if __z88dk_fastcall was to be implemented in sdcc, it should be compatible enough to call function compiled with z88dk.

I was nearly done with implementing __z88dk_fastcall for gbz80 so that parameters are put into the registers used for return values. But I’ll try to implement a different kind of fastcall then, l for 8bit parameters and return values is quite inconvinient if it’s paired with bigger functions, which have variables on the stack.

basxto commented 3 years ago

Well, back to this. Last week I finally understood that z88dk has it's own compiler, assembler etc. besides it's pached sdcc. Loading values into four registers for a 8bit return value is indeed quite undesirable. I suspect __smallc is solely for SCCZ80 compatibility?

I could try to implement __smallc with return registers hl, hlde how SCCZ80 generates them. (Even though __smallc gets accepted, it does not push 8b as 16b currently) And __z88dk_fastcall using return registers as parameters:

suborb commented 3 years ago

I'm all over the place at the moment, working on far too many things at the same time, so I've not had a chance to fix the other issue - apologies.

Yes, __smallc is purely for sccz80 compatibility (likewise sccz80 has __z88dk_sdccdecl for sdcc compatibility).

There's two cases to consider for mixing/matching. Libraries and user code. Although it's theoretically possible to mix-and-match compilers for user-code in classic it's not particularly well tested and there are caveats (there's a wiki page somewhere but I can't find it at the moment)

Getting library interop working is important though, to get sdcc to work together with the libraries I had to make the following modifications:

As a explanatory note, for library routines, sdcc enters via the labels _strlen and sccz80 via strlen which does allow the library to handle the register requirements, but fixing everything up is extremely tedious obviously so the fewer times we need to do that the better.

My feeling is that the priority order is:

  1. Fixing up __smallc with return registers in hl,dehl - that will allow the library routines that return a long value to work correctly with sdcc without needing any workarounds and allows the long functions to work.

  2. The input registers for __z88dk_fastcall. For library work (which is in asm) the input registers could be worked around with an ex de, hl equivalent which isn't particularly onerous/expensives

  3. The promotion from char to int on library functions is suboptimal but is an easy workaround so comes last

basxto commented 3 years ago

I'm all over the place at the moment, working on far too many things at the same time, so I've not had a chance to fix the other issue - apologies.

No problem, it's not urgent, I was just playing around a bit with it and noticed that stuff.

The libraries never take a char parameter - everything got promoted up to an int

Is that a general rule for sdcc or for gbz80 specifically?

3 is part of 1, sdcc user guide explicitly says that __smallc is left to right and that 1 byte arguments are passed as 2 bytes, with the value in the lower byte. Though it does not say anything about the return value. Do they never return <2B?

Do I have to care about the upper byte of chars or can I just push trash into them?

The equivalient to ex de, hl is

ld a, d
ld d, h
ld h, a
ld a, e
ld e, l
ld l, a

which are 6 bytes and 24 cycles wasted. Or if you go for size and do

push de
push hl
pop de
pop hl

it would be 4 bytes and 54 cycles wasted. Fetching a 16b argument from stack is just 5b and 32c: (+1b 16c for the push)

ldhl sp, #2
ld a, (hl+)
ld h, (hl)
ld l, a

My main interest is indeed to have a e,de,hlde __z88dk_fastcall, but I also want z88dk to work somehow.

And this is probably a bug https://github.com/z88dk/z88dk/blob/bd1442a514df74e946d87227d90ac8d0d5616dfa/libsrc/target/gb/gbdk/mode.asm#L26-L31

suborb commented 3 years ago

The libraries never take a char parameter - everything got promoted up to an int

Is that a general rule for sdcc or for gbz80 specifically?

It's not desirable but came out of necessity for the interop - the z80 1b pushing was only fixed a couple of years ago.

| 3 is part of 1, sdcc user guide explicitly says that __smallc is left to right and that 1 byte arguments are passed as 2 bytes, with the value in the lower byte. Though it does not say anything about the return value. Do they never return <2B?

I suspect that everything in the libraries is rounded up to be 2b return value (of which only 1b is of significance).

| The equivalient to ex de, hl is...

Yes, it's not pretty, for single parameter entry it's just 3 bytes since we just need to do ld hl, de. I was just mentioning it since the solution can be staged and can get value without having to do everything all at once. | And this is probably a bug

Oh yes, thank you.

basxto commented 3 years ago

It looks like __smallc already succesfully pushes 1B values as 2B, I just did not recognize them. It does

dec sp
ld  a, #0x04
push    af
inc sp

which is a weird way of doing

ld  e, #0x04
push    de

So it really only needs different return registers.

Implementing fastcall would be trivial, the return would probably only need to care about __smallc.

It's not designed for changing registers of calling conventions dynamically, but I can maybe treat them as completely new calling conventions (smallc_return, smalc_fastcall).

suborb commented 3 years ago

I've done as much as I'm ever going to do. The remainder isn't important so closing.