llvm-mos / llvm-mos-sdk

SDK for developing with the llvm-mos compiler
https://www.llvm-mos.org
Other
279 stars 55 forks source link

pce: llvm-objdump odd disassembly for zp stores #349

Closed bcampbell closed 3 months ago

bcampbell commented 3 months ago

Using llvm-objdump to show disassemble of .elf files compiled for pc engine seem to give some odd results. In the example below, at the line for e00d, the sta operand seems wrong... It shows $2000 when it should be $00?

$ which llvm-objdump
/usr/local/llvm-mos/bin/llvm-objdump
$ cat foo.c
int main()
{
    return 0;
}
$ mos-pce-clang -o foo_pce foo.c 
$ llvm-objdump --disassemble foo_pce.elf

foo_pce.elf:    file format elf32-mos

Disassembly of section .text:

0000e000 <_start>:
    e000: a9 07         lda #$7
    e002: 8d 02 14      sta $1402                   ; 0x1402 <__heap_default_limit+0x402>
    e005: 20 67 e0      jsr $e067 <__pce_vdc_init>
    e008: 20 51 e0      jsr $e051 <__pce_psg_init>

0000e00b <__do_init_stack>:
    e00b: a9 00         lda #$0
    e00d: 85 00         sta $2000                   ; 0x0 <__zp_data_size>
    e00f: a9 40         lda #$40
    e011: 85 01         sta $2001                   ; 0x1 <__zero_zp_bss_size>
    e013: 20 1b e0      jsr $e01b <main>

0000e016 <__after_main>:
    e016: 4c 4c e0      jmp $e04c <exit>

...etc...

The same stub program compiled for the c64 seems to disassemble as I'd expect:

$ mos-c64-clang -o foo_c64 foo.c 
$ llvm-objdump --disassemble foo_c64.elf

foo_c64.elf:    file format elf32-mos

Disassembly of section .text:

0000080d <_start>:
     80d: a2 2f         ldx #$2f
     80f: 86 00         stx $0                      ; 0x0 <__zp_data_size>
     811: a2 3e         ldx #$3e
     813: 86 01         stx $1                      ; 0x1 <__zp_data_size+0x1>

00000815 <__do_init_stack>:
     815: a9 00         lda #$0
     817: 85 02         sta $2                      ; 0x2 <__rc0>
     819: a9 d0         lda #$d0
     81b: 85 03         sta $3                      ; 0x3 <__rc1>

0000081d <shift>:
     81d: a9 0e         lda #$e
     81f: 20 d2 ff      jsr $ffd2 <__heap_start+0xf76c>

...etc...
asiekierka commented 3 months ago

That is correct, and the result of a compromise.

As established above, we cannot distinguish between sta $FF and sta $00FF. This leads to two questions: "How to handle immediate values $00-$FF?" and "How to handle immediate values $2000-$20FF?".

I chose the last option, and so the above disassembly works as intended; that is, a zero-page write to $01 is sta $2001, not sta $1. There's also a test (llvm/test/MC/MOS/zeropage-huc6280.s) which demonstrates the behaviour at assembly level.

In addition, as to LLVM a number is a number, hardcoding distinguishing between two-character and four-character immediates would fall apart as soon as arithmetic gets involved.

I suppose another option would be to adapt WDC-style modifiers to the HuC6280, so for example lda <$00 would always resolve to the zero-page opcode lda $00. This is supported by the assembler (I think), but not the current choice of the disassembler.

bcampbell commented 3 months ago

Ahh, got it - thanks very much for the detailed explanation. I did wonder if it might have been something along those lines. The more I find out about the HuC6280, the more it diverges from my mental model of it being just a fast 6502 with extra bells and whistles bolted on :-)