Open pawosm-arm opened 4 years ago
This backend has always has always supported avoiding undocumented instructions, but there's not any frontend support for it yet. The simplest way to pass it to clang is something like -Xclang -target-feature -Xclang -undoc,-idxhalf
, but once I push it should be as simple as -march=generic
.
Also, I can repro the invalid mnemonics so I'll fix that soon.
This backend has always has always supported avoiding undocumented instructions, but there's not any frontend support for it yet. The simplest way to pass it to clang is something like
-Xclang -target-feature -Xclang -undoc,-idxhalf
, but once I push it should be as simple as-march=generic
.Also, I can repro the invalid mnemonics so I'll fix that soon.
This -march=generic
would be helpful to generate better code as for now, idxhalf
's and lea
's are implemented with generating workaround machine code (which isn't the best solution). Yet what I would like to see the most is the less use of compiler library routines. E.g., in the example main.c
code I wrote:
const int n = ((unsigned char)(argv[0][0]));
...where argv
is a pointer to a pointer of signed char elements. This explicit signed char
to unsigned char
cast may look awkward, unfortunately, if I omit that, the Z80 backend will generate a call to a compiler function __bshrs
(which I don't have implemented) with the casted value passed in the A
register and 7 passed in the B
register. Unfortunately, casting of signed character to integer happens pretty often in C codes, so this needs to be addressed somehow.
Alright, I finally got caught up with unfixed bugs and pushed, so the sext opt I wrote a few weeks ago should solve your __bshrs
issue. Note that, naturally, it is not reasonable to replace every instance of libcalls, but I can still do it if an optimization warrants it. Also, if you need help implementing a libcall just ask, I can write them from scratch pretty quickly, for example here's an implementation of __bshrs
.
__bshrs:
inc b
dec b
ret z
push bc
.loop:
sra a
djnz .loop
pop bc
ret
Hi @jacobly0
Also, if you need help implementing a libcall just ask,
I'm thinking about some more general approach that makes use of the userspace C libraries offered by LLVM compiler suite. There are two sub-projects like that in the monorepo, compiler-rt
and libc
, I wonder which one is more suitable. For practical reasons I was rather looking at the libc
project.
I was having some hopes regarding this libc
project: it's new, not overgrown (yet) and intended for mulit-targeting, so it may be easier to extend. Namely two targets could be added: cpm/z80 for CP/M userspace programs and none/z80 (or even one more, none/ez80) for bare-metal projects. It's not that hard on CP/M where the whole of userspace program needs to fit into 64kB TPA area, eliminating all the problems with FAR pointers (a problem that haunts the most of 8 and 16 bit platforms), leaving only the libm
part with a 'magical coding' requirements. Some of the problems I see are:
libc
project CMake files to use (for building libc) the clang compiler just built during the same make
invocation, instead of using the compiler used for building the rest of LLVM project (with Z80-cross-compiling clang itself)Any hints are welcomed.
Well they all contain disjoint routines, so it's not really a choice between one or the other.
The purpose of compiler-rt is to provide compiler-specific builtins that compute operations that may not be available as an instruction on every processor, but are representable in the source language. Given the simplistic nature of the z80 instruction set, it will need quite a lot of these. Even though it has C implementations of various intrinsics, the majority of these are not going to be useful since they assume a minimal basic set of instructions which the z80 simply does not have. In the end there is just no escaping writing basic operations in assembly such as char * char (note that compiler-rt is full of specialized target-specific assembly routines in the first place anyway).
The purpose of libc on the other hand is to provide a standard set of C routines that can be used by programs to interface with the OS. These are written almost entirely in C except that the code that interfaces directly with the OS sometimes requires asm. Here the challenge is going to be rewriting said interface to work with a different OS in the first place.
And then libm is a bunch of computation-heavy floating-point based C routines, where some of the functions can be replaced by instructions on modern cpus. However, most implementations of libm don't provide basic float operations which are instead provided by hardware or compiler-rt depending on the cpu. Since the z80 would be entirely soft-float by necessity, it's certainly possible to use C implementations for everything, but still, they will depend heavily on compiler-rt assembly routines for basic operations.
Rebased to follow changes on your branch.
updated to follow recent API changes.
Also, if you need help implementing a libcall just ask, I can write them from scratch pretty quickly
Yeah, I'm finding more of those symbols, namely:
U __frameset
U __frameset0
U __indcallhl
U __sand
U __setflag
U __lshl
U __lshru
U __lsub
U __sshl
U __sshrs
U __sshru
U __smulu
U __sxor
U __lmulu
U __ladd
U __lcmpu
U __ldivs
U __sdivs
U __sdivsu
U __srems
The floats are currently outside of my scope, despite encountering them too:
__fadd
, __fdiv
, __fmul
, __fsub
, __fcmp
, __fneg
@jacobly0 from this file https://github.com/c4ooo/TI84-CE-Wrapper-for-Monochrome-TI-BASIC-Programs./blob/master/ti84pce.inc I can figure out where those builtin function names originate from. Can I assume that you have some access to the sources for those functions? I need a bit of your help as I'm stuck with my hobby project without them... now it's __smulu
, but who knows what it's going to be tomorrow...
You are probably interested in the various files here: https://github.com/CE-Programming/toolchain/tree/master/src/std
The source for all of the zds routines are in the previous zilog toolchain release at src/rtl/common
but they are written for the ez80 so won't be very useful, certainly not the multiplication routines. You can see copies of the comments that explain what each routine does here. Here's a z80 __smulu
taken from elsewhere:
__smulu:
push af
push bc
push de
ld e,c
ld d,b
call .mul
pop de
pop bc
pop af
ret
.mul:
xor a
cp h
jr z,.swap
ex de,hl
.swap:
ld c,l
ld l,a
add a,h
call nz,.byte
ld a,c
.byte:
ld b,8
.next:
add hl,hl
add a,a
jr nc,.skip
add hl,de
.skip:
djnz .next
ret
Wow, that's a great response, thanks a million :)
I observed something odd today while trying to use stdint.h's uint32_t type, sizeof(uint32_t) is... 2 (while it's 4 with sdcc as everywhere else...) fortunately, sizeof(unsigned long) is 4.
You were right, clang's stdint.h was including-next host's stdint.h unless -U__STDC_HOSTED__ flag is passed, this solves big ints sizes problem, thx!
Hi @jacobly0, can I ask for your implementation of __lcmpu
?
That's a surprisingly tough one, let me give it a shot...
; speed optimized
__lcmpu:
or a
sbc hl,bc
add hl,bc
push de
push bc
push iy
pop bc
ex de,hl
jr z,.maybeEqual
sbc hl,bc
ex de,hl
pop bc
jr nz,.notEqual
jp pe,.overflow
inc d
.notEqual:
pop de
ret
.overflow:
ld d,$80
dec d
pop de
ret
.maybeEqual:
sbc hl,bc
ex de,hl
pop bc
pop de
ret
; size optimized
__lcmpu:
or a
sbc hl,bc
add hl,bc
push de
push bc
push iy
pop bc
ex de,hl
jr z,.maybeEqual
sbc hl,bc
push af
pop hl
res 6,l
push hl
pop af
db $21
.maybeEqual:
sbc hl,bc
ex de,hl
pop bc
pop de
ret
(All assuming you want to avoid index half reg instructions at least)
It works, thanks!
Hi @jacobly0, can I look a the __setflag
implementation too?
__setflag:
ret po
push af
dec sp
pop af
xor $80
push af
inc sp
pop af
ret
Wow, seems like __lcmpzero
(called before __setflag
) needs to leave P/V flag set properly. Can I have its code too?
; speed optimized
__lcmpzero:
push bc
ld c,a
ld a,l
or h
or e
or d
jr z,.zero
ld a,d
or 1
.zero:
cp 0
ld a,c
pop bc
ret
; size optimized (and basically what zds does)
__lcmpzero:
push bc
ld bc,0
push bc
ex (sp),iy
call __lcmpu
pop iy
pop bc
ret
Works lovely, thanks!
False alarm, I've found a bug in the rest of the code.
AArgh, it generated call to __sshl
in a simple array access. Also I need __sdivs
to complete this function, @jacobly0 can I see your impl. of both?
(I meant sshl not smulu which is already there, my mistake).
All of the shifts follow the same pattern, so I'll do them all at once:
__bshl:
inc b
dec b
ret z
push bc
.loop:
add a,a
djnz .loop
pop bc
ret
__bshru:
inc b
dec b
ret z
push bc
.loop:
srl a
djnz .loop
pop bc
ret
__bshrs:
inc b
dec b
ret z
push bc
.loop:
sra a
djnz .loop
pop bc
ret
__sshl:
inc c
dec c
ret z
push bc
ld b,c
.loop:
add hl,hl
djnz .loop
pop bc
ret
__sshru:
inc c
dec c
ret z
push bc
ld b,c
.loop:
srl h
rr l
djnz .loop
pop bc
ret
__sshrs:
inc c
dec c
ret z
push bc
ld b,c
.loop:
sra h
rr l
djnz .loop
pop bc
ret
And some division related madness based on routines from same project as the multiplication:
__sdvrmu:
push af
push bc
ld d,b
ld e,c
ld b,16
ld a,h
ld c,l
ld hl,0
.loop:
scf
rl c
rla
adc hl,hl
sbc hl,de
jr nc,.skip
add hl,de
dec c
.skip:
djnz .loop
ld d,a
ld e,c
pop bc
pop af
ret
__sremu:
push de
call __sdvrmu
pop de
ret
__srems:
push bc
bit 7,h
push af
jp z,.skip1
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
.skip1:
bit 7,b
jr z,.skip2
xor a
sub c
ld c,a
sbc a,a
sub b
ld b,a
.skip2:
call __sremu
pop af
pop bc
ret z
push af
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
pop af
ret
__sdivu:
push de
call __sdvrmu
ex de,hl
pop de
ret
__sdivs:
push bc
push af
ld a,h
xor b
push af
xor b
jp p,.skip1
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
.skip1:
bit 7,b
jr z,.skip2
xor a
sub c
ld c,a
sbc a,a
sub b
ld b,a
.skip2:
call __sdivu
pop af
pop bc
ld a,b
pop bc
ret p
push af
xor a
sub l
ld l,a
sbc a,a
sub h
ld h,a
pop af
ret
It works, thanks again :)
Can't merge changes to the asm output without a way to select between asm flavors.
Hey @jacobly0 it was 2 years ago, so I guess this PR needs massive rework, at least to solve merge conflicts and to make it compatible with any of the LLVM API changes that occurred during that time.
AFAIR my solution does not output any assembly, it creates ELF binaries directly (namely, ELF LSB relocatable, *unknown arch 0xdc* version 1 (SYSV), not stripped
, to have such binaries outputted, the compiler is started with those flags: --target=z80-none-elf -march=z80 -fintegrated-as -fcommon -fno-builtin -U__STDC_HOSTED__
), so I don't know what changes to the asm output
aren't able to select between asm flavors
...
I'm referring to this, file-local labels that are unmarked or that start with dot are not supported by the assembler/linker I'm using.
Interestingly, my assembler/linker does support outputting ELF files now and those files are correctly dumped by llvm object inspection tools compiled from the z80 branch.
Sadly, I don't remember now why I had to make that change, so I'd have to track it back before finding a way to make it correct :(
Following the ability of the other 8-bit platform supported by LLVM (AVR) to generate ELF object files, I have prepared this crude ELF code emmiter for Z80. It was not extensively tested, mostly due to the lack of a compatible runtime library (e.g. for CP/M system).