ELF code emitter for Z80 architecture (naiive impl.)

pawosm-arm commented 4 years ago

Following the ability of the other 8-bit platform supported by LLVM (AVR) to generate ELF object files, I have prepared this crude ELF code emmiter for Z80. It was not extensively tested, mostly due to the lack of a compatible runtime library (e.g. for CP/M system).

Some caveats:

- The Z80 backend emits a lot of compiler library calls as specified in
  llvm/lib/Target/Z80/Z80ISelLowering.cpp making it hard for practical use

jacobly0 commented 4 years ago

This backend has always has always supported avoiding undocumented instructions, but there's not any frontend support for it yet. The simplest way to pass it to clang is something like -Xclang -target-feature -Xclang -undoc,-idxhalf, but once I push it should be as simple as -march=generic.

Also, I can repro the invalid mnemonics so I'll fix that soon.

pawosm-arm commented 4 years ago

This backend has always has always supported avoiding undocumented instructions, but there's not any frontend support for it yet. The simplest way to pass it to clang is something like -Xclang -target-feature -Xclang -undoc,-idxhalf, but once I push it should be as simple as -march=generic.

Also, I can repro the invalid mnemonics so I'll fix that soon.

This -march=generic would be helpful to generate better code as for now, idxhalf's and lea's are implemented with generating workaround machine code (which isn't the best solution). Yet what I would like to see the most is the less use of compiler library routines. E.g., in the example main.c code I wrote:

const int n = ((unsigned char)(argv[0][0]));

...where argv is a pointer to a pointer of signed char elements. This explicit signed char to unsigned char cast may look awkward, unfortunately, if I omit that, the Z80 backend will generate a call to a compiler function __bshrs (which I don't have implemented) with the casted value passed in the A register and 7 passed in the B register. Unfortunately, casting of signed character to integer happens pretty often in C codes, so this needs to be addressed somehow.

jacobly0 commented 4 years ago

Alright, I finally got caught up with unfixed bugs and pushed, so the sext opt I wrote a few weeks ago should solve your __bshrs issue. Note that, naturally, it is not reasonable to replace every instance of libcalls, but I can still do it if an optimization warrants it. Also, if you need help implementing a libcall just ask, I can write them from scratch pretty quickly, for example here's an implementation of __bshrs.

__bshrs:
    inc b
    dec b
    ret z
    push bc
.loop:
    sra a
    djnz .loop
    pop bc
    ret

pawosm-arm commented 4 years ago

Hi @jacobly0

Also, if you need help implementing a libcall just ask,

I'm thinking about some more general approach that makes use of the userspace C libraries offered by LLVM compiler suite. There are two sub-projects like that in the monorepo, compiler-rt and libc, I wonder which one is more suitable. For practical reasons I was rather looking at the libc project.

I was having some hopes regarding this libc project: it's new, not overgrown (yet) and intended for mulit-targeting, so it may be easier to extend. Namely two targets could be added: cpm/z80 for CP/M userspace programs and none/z80 (or even one more, none/ez80) for bare-metal projects. It's not that hard on CP/M where the whole of userspace program needs to fit into 64kB TPA area, eliminating all the problems with FAR pointers (a problem that haunts the most of 8 and 16 bit platforms), leaving only the libm part with a 'magical coding' requirements. Some of the problems I see are:

it's laborious
I didn't figure out yet how to modify libc project CMake files to use (for building libc) the clang compiler just built during the same make invocation, instead of using the compiler used for building the rest of LLVM project (with Z80-cross-compiling clang itself)
similarly, I still didn't figure out yet how to modify CMake files to ensure static library generation even if the rest of the current LLVM building process is intended to build shared libraries....

Any hints are welcomed.

jacobly0 commented 4 years ago

Well they all contain disjoint routines, so it's not really a choice between one or the other.

The purpose of compiler-rt is to provide compiler-specific builtins that compute operations that may not be available as an instruction on every processor, but are representable in the source language. Given the simplistic nature of the z80 instruction set, it will need quite a lot of these. Even though it has C implementations of various intrinsics, the majority of these are not going to be useful since they assume a minimal basic set of instructions which the z80 simply does not have. In the end there is just no escaping writing basic operations in assembly such as char * char (note that compiler-rt is full of specialized target-specific assembly routines in the first place anyway).

The purpose of libc on the other hand is to provide a standard set of C routines that can be used by programs to interface with the OS. These are written almost entirely in C except that the code that interfaces directly with the OS sometimes requires asm. Here the challenge is going to be rewriting said interface to work with a different OS in the first place.

And then libm is a bunch of computation-heavy floating-point based C routines, where some of the functions can be replaced by instructions on modern cpus. However, most implementations of libm don't provide basic float operations which are instead provided by hardware or compiler-rt depending on the cpu. Since the z80 would be entirely soft-float by necessity, it's certainly possible to use C implementations for everything, but still, they will depend heavily on compiler-rt assembly routines for basic operations.

pawosm-arm commented 4 years ago

Rebased to follow changes on your branch.

pawosm-arm commented 4 years ago

updated to follow recent API changes.

pawosm-arm commented 4 years ago

Also, if you need help implementing a libcall just ask, I can write them from scratch pretty quickly

Yeah, I'm finding more of those symbols, namely:

         U __frameset
         U __frameset0
         U __indcallhl
         U __sand
         U __setflag
         U __lshl
         U __lshru
         U __lsub
         U __sshl
         U __sshrs
         U __sshru
         U __smulu
         U __sxor
         U __lmulu
         U __ladd
         U __lcmpu
         U __ldivs
         U __sdivs
         U __sdivsu
         U __srems

The floats are currently outside of my scope, despite encountering them too: __fadd, __fdiv, __fmul, __fsub, __fcmp, __fneg

pawosm-arm commented 4 years ago

@jacobly0 from this file https://github.com/c4ooo/TI84-CE-Wrapper-for-Monochrome-TI-BASIC-Programs./blob/master/ti84pce.inc I can figure out where those builtin function names originate from. Can I assume that you have some access to the sources for those functions? I need a bit of your help as I'm stuck with my hobby project without them... now it's __smulu, but who knows what it's going to be tomorrow...

adriweb commented 4 years ago

You are probably interested in the various files here: https://github.com/CE-Programming/toolchain/tree/master/src/std

jacobly0 commented 4 years ago

The source for all of the zds routines are in the previous zilog toolchain release at src/rtl/common but they are written for the ez80 so won't be very useful, certainly not the multiplication routines. You can see copies of the comments that explain what each routine does here. Here's a z80 __smulu taken from elsewhere:

__smulu:
    push    af
    push    bc
    push    de
    ld  e,c
    ld  d,b
    call    .mul
    pop de
    pop bc
    pop af
    ret
.mul:
    xor a
    cp  h
    jr  z,.swap
    ex  de,hl
.swap:
    ld  c,l
    ld  l,a
    add a,h
    call    nz,.byte
    ld  a,c
.byte:
    ld  b,8
.next:
    add hl,hl
    add a,a
    jr  nc,.skip
    add hl,de
.skip:
    djnz    .next
    ret

pawosm-arm commented 4 years ago

Wow, that's a great response, thanks a million :)

pawosm-arm commented 4 years ago

I observed something odd today while trying to use stdint.h's uint32_t type, sizeof(uint32_t) is... 2 (while it's 4 with sdcc as everywhere else...) fortunately, sizeof(unsigned long) is 4.

jacobly0 commented 4 years ago

Then you are using a stdint.h that isn't valid for the z80. The one from glibc definitely won't work. The one packaged with clang would work if you don't have another invalid stdint.h in the include search path. I use a much more simplified version here.

pawosm-arm commented 4 years ago

You were right, clang's stdint.h was including-next host's stdint.h unless -U__STDC_HOSTED__ flag is passed, this solves big ints sizes problem, thx!

pawosm-arm commented 4 years ago

Hi @jacobly0, can I ask for your implementation of __lcmpu?

jacobly0 commented 4 years ago

That's a surprisingly tough one, let me give it a shot...

; speed optimized
__lcmpu:
    or a
    sbc hl,bc
    add hl,bc
    push de
    push bc
    push iy
    pop bc
    ex de,hl
    jr z,.maybeEqual
    sbc hl,bc
    ex de,hl
    pop bc
    jr nz,.notEqual
    jp pe,.overflow
    inc d
.notEqual:
    pop de
    ret
.overflow:
    ld d,$80
    dec d
    pop de
    ret
.maybeEqual:
    sbc hl,bc
    ex de,hl
    pop bc
    pop de
    ret

; size optimized
__lcmpu:
    or a
    sbc hl,bc
    add hl,bc
    push de
    push bc
    push iy
    pop bc
    ex de,hl
    jr z,.maybeEqual
    sbc hl,bc
    push af
    pop hl
    res 6,l
    push hl
    pop af
    db $21
.maybeEqual:
    sbc hl,bc
    ex de,hl
    pop bc
    pop de
    ret

(All assuming you want to avoid index half reg instructions at least)

pawosm-arm commented 4 years ago

It works, thanks!

pawosm-arm commented 4 years ago

Hi @jacobly0, can I look a the __setflag implementation too?

jacobly0 commented 4 years ago

__setflag:
    ret po
    push af
    dec sp
    pop af
    xor $80
    push af
    inc sp
    pop af
    ret

pawosm-arm commented 4 years ago

Wow, seems like __lcmpzero (called before __setflag) needs to leave P/V flag set properly. Can I have its code too?

jacobly0 commented 4 years ago

; speed optimized
__lcmpzero:
    push bc
    ld c,a
    ld a,l
    or h
    or e
    or d
    jr z,.zero
    ld a,d
    or 1
.zero:
    cp 0
    ld a,c
    pop bc
    ret

; size optimized (and basically what zds does)
__lcmpzero:
    push bc
    ld bc,0
    push bc
    ex (sp),iy
    call __lcmpu
    pop iy
    pop bc
    ret

pawosm-arm commented 4 years ago

Works lovely, thanks!

pawosm-arm commented 4 years ago

False alarm, I've found a bug in the rest of the code.

pawosm-arm commented 4 years ago

AArgh, it generated call to __sshl in a simple array access. Also I need __sdivs to complete this function, @jacobly0 can I see your impl. of both? (I meant sshl not smulu which is already there, my mistake).

jacobly0 commented 4 years ago

All of the shifts follow the same pattern, so I'll do them all at once:

__bshl:
    inc b
    dec b
    ret z
    push bc
.loop:
    add a,a
    djnz .loop
    pop bc
    ret

__bshru:
    inc b
    dec b
    ret z
    push bc
.loop:
    srl a
    djnz .loop
    pop bc
    ret

__bshrs:
    inc b
    dec b
    ret z
    push bc
.loop:
    sra a
    djnz .loop
    pop bc
    ret

__sshl:
    inc c
    dec c
    ret z
    push bc
    ld b,c
.loop:
    add hl,hl
    djnz .loop
    pop bc
    ret

__sshru:
    inc c
    dec c
    ret z
    push bc
    ld b,c
.loop:
    srl h
    rr l
    djnz .loop
    pop bc
    ret

__sshrs:
    inc c
    dec c
    ret z
    push bc
    ld b,c
.loop:
    sra h
    rr l
    djnz .loop
    pop bc
    ret

And some division related madness based on routines from same project as the multiplication:

__sdvrmu:
    push af
    push bc
    ld d,b
    ld e,c
    ld b,16
    ld a,h
    ld c,l
    ld hl,0
.loop:
    scf
    rl c
    rla
    adc hl,hl
    sbc hl,de
    jr nc,.skip
    add hl,de
    dec c
.skip:
    djnz .loop
    ld d,a
    ld e,c
    pop bc
    pop af
    ret

__sremu:
    push de
    call __sdvrmu
    pop de
    ret

__srems:
    push bc
    bit 7,h
    push af
    jp z,.skip1
    xor a
    sub l
    ld l,a
    sbc a,a
    sub h
    ld h,a
.skip1:
    bit 7,b
    jr z,.skip2
    xor a
    sub c
    ld c,a
    sbc a,a
    sub b
    ld b,a
.skip2:
    call __sremu
    pop af
    pop bc
    ret z
    push af
    xor a
    sub l
    ld l,a
    sbc a,a
    sub h
    ld h,a
    pop af
    ret

__sdivu:
    push de
    call __sdvrmu
    ex de,hl
    pop de
    ret

__sdivs:
    push bc
    push af
    ld a,h
    xor b
    push af
    xor b
    jp p,.skip1
    xor a
    sub l
    ld l,a
    sbc a,a
    sub h
    ld h,a
.skip1:
    bit 7,b
    jr z,.skip2
    xor a
    sub c
    ld c,a
    sbc a,a
    sub b
    ld b,a
.skip2:
    call __sdivu
    pop af
    pop bc
    ld a,b
    pop bc
    ret p
    push af
    xor a
    sub l
    ld l,a
    sbc a,a
    sub h
    ld h,a
    pop af
    ret

pawosm-arm commented 4 years ago

It works, thanks again :)

jacobly0 commented 2 years ago

Can't merge changes to the asm output without a way to select between asm flavors.

pawosm-arm commented 2 years ago

Hey @jacobly0 it was 2 years ago, so I guess this PR needs massive rework, at least to solve merge conflicts and to make it compatible with any of the LLVM API changes that occurred during that time.

AFAIR my solution does not output any assembly, it creates ELF binaries directly (namely, ELF LSB relocatable, *unknown arch 0xdc* version 1 (SYSV), not stripped, to have such binaries outputted, the compiler is started with those flags: --target=z80-none-elf -march=z80 -fintegrated-as -fcommon -fno-builtin -U__STDC_HOSTED__), so I don't know what changes to the asm output aren't able to select between asm flavors...

jacobly0 commented 2 years ago

I'm referring to this, file-local labels that are unmarked or that start with dot are not supported by the assembler/linker I'm using.

Interestingly, my assembler/linker does support outputting ELF files now and those files are correctly dumped by llvm object inspection tools compiled from the z80 branch.

pawosm-arm commented 2 years ago

Sadly, I don't remember now why I had to make that change, so I'd have to track it back before finding a way to make it correct :(

jacobly0 / llvm-project

ELF code emitter for Z80 architecture (naiive impl.) #10