simias / psx-sdk-rs

Rust SDK targeting the original Playstation
85 stars 8 forks source link

LLVM can't target the Playstation CPU (MIPS-I) #1

Open simias opened 9 years ago

simias commented 9 years ago

I've been struggling for the past few days with extremely erratic behaviour from my playstation code. Simple code would work correctly but when things get more complicated (in particular function calls and BIOS calls) the program would appear to behave semi-randomly (bus errors, freezes, functions get executed several times in a row when they shouldn't etc...)

After some intense hair pulling I figured out the cause of the problem: the MIPS-I architecture has load delay slots. It means that when the CPU loads a value from memory (through a lw instruction for instance) the next instruction will run before the value is put in the target register. That means that you have to wait one cycle (using a nop or some independant instruction) before you can use the value.

The problem is that LLVM doesn't support MIPS-I at the moment and MIPS-II doesn't have this constraint. So for instance when the compiler generates the function epilogue that pops the value of ra (the return address) from the stack and jumps to it we end up with something like:

lw      ra,20(sp)
jr      ra

Unfortunately this code doesn't work on the Playstation's R2000 CPU: the lw doesn't have the time to finish before the jr is called (the jr is in the load delay slot), so the jump takes whatever value was in ra before the lw instruction. And that explains the very erratic behaviour.

Unfortunately as far as I can tell LLVM doesn't support MIPS-I at all, if I attempt to put "cpu": "mips1" in target.json I get this error:

LLVM ERROR: Code generation for MIPS-I is not implemented

Unfortunately there's no workaround. The solution would be to add MIPS-I support to LLVM but that's of course a pretty ambitious project and my plate is full at the moment.

As long as this issue is not fixed there's no point working on this SDK, nothing but the most trivial code will run as intended on the console.

lf94 commented 6 years ago

If it's mostly the same, why not copy whatever MIPS-II does and change that behavior? Would be much easier than writing it all from scratch.

simias commented 6 years ago

I'm not at all familiar with LLVM so I didn't pursue this further. I assume it can't be too complicated but I really have no idea.

asiekierka commented 4 years ago

There is a patch available at https://github.com/impiaaa/llvm-project/commit/f2ecffebaa3200703cbf2c60a1d642a5c5254b01 but according to the developer it could use additional testing.

simias commented 4 years ago

That's great news although I must say that I wouldn't even know where to start to generate a rustc using a custom LLVM so I'm probably going to wait until it (hopefully) ends up in a stable release.

A few weeks ago I saw that there was a GCC backend for rustc in the works: https://github.com/sapir/gcc-rust

That might be an other way to get MIPSI support in Rust.

XaviDCR92 commented 4 years ago

I wanted to play around with the Zig programming language (see https://github.com/XaviDCR92/psx-zig), whose compiler is also based on LLVM, so merged the patch into the release/10.x branch of https://github.com/XaviDCR92/llvm-project . It patch seems to work correctly, so I encourage you to resume your work on psx-sdk-rs. Well-known issues with Zig's packed structs are preventing further development, but I think should be now fine with Rust.

ayrtonm commented 3 years ago

I recently got rustc working with LLVM 11 patched to support load delay slots if there's still interest in this. My fork has instructions in the README along with @impiaaa's LLVM patch (with a tiny fix) and a patch to build target.json and psx.ld into rustc. I checked that both rustc and llc by itself handle load delay slots correctly, though the tests weren't very extensive and I've only run programs on ePSXe.

I think it would've been convenient to use the JSON target, but it seems building the target into rustc is required since libcore won't compile unless the Atomic* API is disabled (by setting max_atomic_width = Some(0) in the target spec). This is because MIPS I doesn't have a sync instruction, unlike MIPS II. Either way rustc has to be compiled using the patched LLVM so it wouldn't have saved much time in building the compiler anyway.

Also I threw in a stripped-down copy of @simias's Makefile for the linker and merged libbios into a submodule of libpsx to make the SDK more user-friendly.

XaviDCR92 commented 3 years ago

@ayrtonm , the delay slot patch seems to work well for me too (I also had to apply the tiny fix to avoid a compile-time error).

However, I am having a curious issue that's making my Zig/C application crash by jumping into an invalid address. The root cause of this issue is $ra being assigned back to a value in $sp, 52, but $sp holds an invalid value and thus undefined behavior occurs.

After some investigation, it looks like bad generated code from the compiler side (I can't tell so far whether it's caused by either the LLVM backend or the Zig frontend). Consider the following lines of C code from PSXSDK:

for(a = 0; a < l; a+=2)
    GPU_DATA_PORT = image[a]|(image[a+1]<<16);

Among other low-level instructions, the following branch instruction is generated:

80012eb8:       14200017        bnez    at,80012f18 <LoadImage+0x1ac>

And this is what address 80012f18 contains (added some lines for context):

80012f00:       03c0e825        move    sp,s8
80012f04:       8fbe0000        lw      s8,0(sp)
80012f08:       8fbf0004        lw      ra,4(sp)
80012f0c:       00000000        nop
80012f10:       03e00008        jr      ra
80012f14:       27bd0008        addiu   sp,sp,8
        while(!(GPU_CONTROL_PORT & (1<<0x1c)));
80012f18:       0000000d        break

80012f1c <GsSetDrawEnv>:
{
80012f1c:       27bdfff8        addiu   sp,sp,-8

Therefore, 80012f18 is, much likely, not the address the program should be jumping to. Instead of leaving $sp to its original value and jumping to $ra, break is executed and GsSetDrawEnv is accidentally executed, causing undefined behaviour.

The compiler patch has added 18 nop instructions in LoadImage. I am not sure whether it's related or not, but 80012f18h - (18d * 4) = 80012ed0h, which casually belongs to the start of the loop.

        for(a = 0; a < l; a+=2)
80012ed0:       24e80002        addiu   t0,a3,2

Would it be possible that the LLVM backend is miscalculating destination addresses in branch instructions such as bnez since the patch is (apparently silently) adding several nop between instruction? I have not investigated other functions yet, but I must admit the application has already had erratic behavior several times now and this might explain it.

simias commented 3 years ago

I haven't touched this code in forever but I'm glad to see that there's progress!

@XaviDCR92 Have you tried to see the difference the various optimization levels make? Regardless of the miscalculated branch addresses it would be odd to end up with a bunch of useless NOPs in optimized code. What I'm getting at here is that a bunch of leftover NOPs with optimizations enabled seems like it would be a good sign that something messes up in the compiler.

XaviDCR92 commented 3 years ago

@simias Zig provides a --release-small flag roughly equivalent to gcc's -Os, but my application would not work at all with these optimizations enabled. I still have to find a reason for this.

OTOH, these apparently useless nop instructions must be there because of the load delay slot problem on MIPS I that you reported on your first message. There is no workaround other than replacing them with useful instructions that do not depend on their previous instruction, but that's not a serious issue.

Surprisingly though, I have found a possible reason why the code above was inadvertently jumping to a break instruction - there is a buffer overflow caused by this loop when called from this function, as pal is only 2 elements long and LoadImage is attempting to access pal[0 .. 15].

gcc would not issue any diagnostic message on this, but zig instead adds runtime bound checks when building in debug mode, thus executing the break instruction once an out-of-bound access occurs. As a quick workaround, I expanded pal from 2 to 16 elements and now GsLoadFont executes successfully.

More worryingly, I have also found other places where zig generates wrong code with break instructions here and there e.g.: GsSetDrawEnv:

80012f1c <GsSetDrawEnv>:
{
80012f1c:       27bdfff8        addiu   sp,sp,-8
80012f20:       afbf0004        sw      ra,4(sp)
80012f24:       afbe0000        sw      s8,0(sp)
80012f28:       03a0f025        move    s8,sp
        draw_mode_packet = (0xe1<<24)|(drawenv->draw_on_display>=1)<<10|
80012f2c:       0000000d        break

80012f30 <gpu_data_ctrl>:
{
80012f30:       27bdfff8        addiu   sp,sp,-8
80012f34:       afbf0004        sw      ra,4(sp)

Compared to the implementation by mipsel-unknown-elf-gcc:

80034bc0 <GsSetDrawEnv>:
{
80034bc0:       27bdffe8        addiu   sp,sp,-24
80034bc4:       afbf0014        sw      ra,20(sp)
        draw_mode_packet = (0xe1<<24)|(drawenv->draw_on_display>=1)<<10|
80034bc8:       90820001        lbu     v0,1(a0)
80034bcc:       00000000        nop
80034bd0:       1440003a        bnez    v0,80034cbc <GsSetDrawEnv+0xfc>
80034bd4:       00802825        move    a1,a0
80034bd8:       3c02e100        lui     v0,0xe100
                (drawenv->dither>=1)<<9;
80034bdc:       90a30000        lbu     v1,0(a1)
        gpu_data_ctrl(0xe3, (drawenv->y<<10)|drawenv->x);
80034be0:       84a40004        lh      a0,4(a1)
                (drawenv->dither>=1)<<9;
80034be4:       0003182b        sltu    v1,zero,v1
80034be8:       00031a40        sll     v1,v1,0x9
        draw_mode_packet = (0xe1<<24)|(drawenv->draw_on_display>=1)<<10|
80034bec:       00431025        or      v0,v0,v1
80034bf0:       3c038008        lui     v1,0x8008
80034bf4:       ac625d78        sw      v0,23928(v1)
        GPU_CONTROL_PORT = 0x01000000;
80034bf8:       3c060100        lui     a2,0x100
80034bfc:       3c031f80        lui     v1,0x1f80
80034c00:       ac661814        sw      a2,6164(v1)
        gpu_data_ctrl(0xe3, (drawenv->y<<10)|drawenv->x);
80034c04:       84a80002        lh      t0,2(a1)
...

In the meantime, I'm improving my fork of PCSX-r so pcsxr notifies whenever it executes a break instruction when debugging with mipsel-unknown-elf-gdb.

ayrtonm commented 3 years ago

After playing around with rust on the playstation for a while, I haven't been able to reproduce @XaviDCR92's issue with out of place break and my demos are behaving as expected on emulators.

I haven't used zig extensively, but I think the issue may be runtime checks or UB on shift overflow. For example, drawenv->draw_on_display is a u8 and I'm not sure if it gets implicitly cast to bool after the >= or if it just becomes a 0 or 1 as a u8. But either way, it then gets shifted more than 8 bits which might be a problem depending on how strict zig's type system is.

I also tried compiling some C with zig (0.6.0) myself and found that unsigned int z = 0xe1 << 24 caused me to hit an illegal instruction while unsigned int z = (unsigned int)0xe1 << 24 didn't.

int f(unsigned char x, unsigned char y) {
  2014e0:       55                      push   %rbp
  2014e1:       48 89 e5                mov    %rsp,%rbp
    unsigned int z = (0xe1 << 24);
  2014e4:       0f 0b                   ud2
int f(unsigned char x, unsigned char y) {
  201560:       55                      push   %rbp
  201561:       48 89 e5                mov    %rsp,%rbp
    unsigned int z = ((unsigned int)0xe1 << 24);
    return z;
  201564:       b8 00 00 00 e1          mov    $0xe1000000,%eax
  201569:       5d                      pop    %rbp
  20156a:       c3                      retq
  20156b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

It seems to me that zig uses the smallest type possible to represent integer literals, but I couldn't find any specific documentation on that. Granted that was all on x86 not MIPS, but I think the issue might be in that general area.

sajattack commented 2 years ago

Not sure how relevant it is to you (does PSX have an FPU?), but floating point comparisons also have a delay slot https://github.com/rust-lang/rust/issues/91442#issuecomment-985186799

ayrtonm commented 2 years ago

nah the PSX doesn't have a floating point unit. It supports fixed point but that's all custom coprocessor opcodes from inline assembly so llvm's not an issue there.

netthier commented 2 years ago

Seems initial support for load delay slots is now part of the latest LLVM release if I understand that commit correctly. Is this issue still relevant?

ayrtonm commented 2 years ago

It's not in upstream rust yet, but I was going to cherry-pick that commit onto the rustc branch and open a PR to add the psx target.

impiaaa commented 2 years ago

Also, filling load delay slots in code generation is only one part of full MIPS-I support. Remaining tasks include:

  1. load delay slots in .set reorder
  2. sync instruction
  3. 64-bit coprocessor transfers
  4. Trap-on-condition instructions
ayrtonm commented 2 years ago

Yeah those would be definitely be good to upstream for MIPS-I, but the psx target's still fairly usable without them. I think the only really critical issue is the sync instruction since DMA is borderline unusable without it.

ayrtonm commented 2 years ago

This target's now usable with just a target JSON on nightly, so building the rust compiler isn't necessary anymore. I updated the crate's instructions in case anyone wants to try it out.

ayrtonm commented 2 years ago

I opened a PR adding a built-in target to rustc so hopefully the target JSON will soon be unnecessary.

simias commented 2 years ago

That's amazing work!

Now I have to find a way to get my test setup working again...