riscvarchive / riscv-qemu

QEMU with RISC-V (RV64G, RV32G) Emulation Support
385 stars 154 forks source link

qemu-system-riscv32 store doesn't modify the memory during a `sw ra, 12(sp)` #137

Open vsiles opened 6 years ago

vsiles commented 6 years ago

Hi ! I'm trying to test a simple bare metal C code with qemu-system-riscv32 [1], and I think the jump instructions are off by 4 bytes. You can simply test it by running the "ROM" code of the sifive_e machine in gdb:

$ qemu-system-riscv32 -nographic -machine sifive_e -kernel foo.elf -s -S
$ # in another terminal
$ riscv64-unknown-elf-gdb
(gdb) target remote :1234
(gdb) layout asm   /* should display 0x1000 lui t0, 0x20400 | 0x1004  jr t0 */
(gdb) p /x $pc
$1 = 0x1000          /* lui t0, 0x20400 */
(gdb) stepi
(gdb) p /x $pc       /* jr t0 */
$2 = 0x1004
(gdb) p /x $t0
$3 = 0x2040000
(gdb) stepi
(gdb) p /x $pc
$4 = 0x20400004 /* should be 0x20400000 */

This as a nasty side effect with call: when we get back from a function call, the instruction restoring ra is skipped, so if you have several chained function calls, the first ret will return and the second ret will trigger an illegal access to address 0 (most of the time).

For the record, my actual test code is a bare metal application printing "hello world" on the uart:

[1] actual revision:

$ git log -1
commit 4d37030aa03e07ae927174c8fce0227648552e51
Author: Michael Clark <mjc@sifive.com>
Date:   Tue Apr 24 10:12:27 2018 +1200
vsiles commented 6 years ago

It seems there is two separate issues in fact:

You'll fine attached my current project (set to compile the sifive_e version) core.tar.gz

vsiles commented 6 years ago

The issue seems related to sw on the risc32 machines: on function entry, ra is stored on the stack using sw ra, 12(sp) but the memory doesn't seem correctly modified, so the future lw will only recover a 0.

I only had the issue on 32 bit RISCV.

michaeljclark commented 6 years ago

Thanks for the report, test case and GDB testing.

There are different code paths in target/riscv/translate.c for when a debugger is attached. I wonder if the issue is only present when GDB is attached.

vsiles commented 6 years ago

Now that I understand a bit better riscv, I'll try to reproduce the issue without gdb to confirm your idea. I'll let you know

vsiles commented 6 years ago

I wrote a minimal code that basically just dump info from Machine CSR registers. I works in 32 bit mode (sifive_u) but not in 32 bit mode (sifive_e) with the following compilation / qemu options:

CFLAGS64        += -march=rv64imafdc -mabi=lp64  -mcmodel=medany
CFLAGS32        += -march=rv32imafdc -mabi=ilp32  -mcmodel=medany

qemu-system-riscv64 -machine sifive_u -nographic -kernel build/core.elf
qemu-system-riscv32 -machine sifive_e -kernel build/core.elf -nographic

Here are the two main files:

# start.S
.global _start
_start:
    /* global pointer stuff */
.option push
.option norelax
    la      gp, __global_pointer$
.option pop

    /* Clear bss section */
    la      a0, __bss_start
    la      a1, _end
    bgeu    a0, a1, 2f
1:
    sw      zero, 0(a0)
    addi    a0, a0, 4
    bltu    a0, a1, 1b
2:

    la      sp, machine_stack_start     /* also correctly initialize sp for M mode */

    /* Save M stack current address */
    mv      s0, sp
    /*
     * Allocate room on the kernel stack for the
     * supervisor regs
     */
    addi    sp, sp, -(REGS_SIZE)

    /* M main function */
    mv      a0, sp
    call    machine_main
/* main.c */
#define UART_BASE_PA    UINT32_C(0x10013000)

static void uart_init(void)
{
    /* Monitor Mode */
    console_init(UART_BASE_PA);
}

static void entry(uint64_t mode);

void machine_main(regs *supervisor_regs)
{
    uart_init(); /* It usually never exits this call because of the "ra == 0" issue */
    entry(0);

    while (1) ;
}
michaeljclark commented 6 years ago

BTW this might help:

riscv-probe works on riscv32 and riscv64 in the spike_v1.9.1 and spike_v1.10 machines. e.g.

$ qemu-system-riscv32 -nographic -machine spike_v1.10 -kernel bin/riscv32/probe-htif
$ qemu-system-riscv64 -nographic -machine spike_v1.10 -kernel bin/riscv64/probe-htif

It would be possible to add the sifive uart to riscv-probe by updating the Makefile and adding mbi_uart.c and a new compile targets and linker scripts:

Note sifive_e and sifive_u can be run in either riscv32 or riscv64 modes:

Something like this should work for the SiFive UART (borrowed from bbl):

#include <stdint.h>

enum {
    UART_REG_TXFIFO = 0,
    UART_REG_RXFIFO = 1,
    UART_REG_TXCTRL = 2,
    UART_REG_RXCTRL = 3,
    UART_REG_DIV = 4
};

uint32_t *uart = (uint32_t *)0x10013000;

int mbi_console_getchar()
{
  int32_t ch = uart[UART_REG_RXFIFO];
  if (ch < 0) return -1;
  return ch;
}

void mbi_console_putchar(uint8_t ch)
{
    volatile uint32_t *tx = &uart[UART_REG_TXFIFO];
    while ((int32_t)(*tx) < 0);
    *tx = ch;
}

I might abstract riscv-probe and build binaries for sifive_e and sifive_u as well as spike by implementing mbi_uart.c and adding linker scripts...

vsiles commented 6 years ago

I'll have a look thanks ! It looks like a lot like the small code I wrote to understand all the machine registers :)

michaeljclark commented 6 years ago

Yes. I used it for basic compatibility testing for reading and comparing machine CSRs between spike and QEMU in 32-bit mode and 64-bit mode. You'll get different results if you run it on spike_v1.9.1 compared to spike_v1.10. It's possible to add more CSRs. It uses the CSR numbers in hex rather than the literal strings so that it can read CSRs that are not implemented in binutils.

michaeljclark commented 6 years ago

There is also https://github.com/riscv/riscv-tests which can be run in QEMU. The 'v' tests run in the spike_v1.10 machine.

michaeljclark commented 6 years ago

Okay, I updated riscv-probe (https://github.com/michaeljclark/riscv-probe) with support for both HTIF and UART, added invocation examples to the README and moved the linker script into a conf directory (defaults to 0x80000000 for spike and sifive_u).

diodesign commented 5 years ago

I wrote a minimal code that basically just dump info from Machine CSR registers. I works in 32 bit mode (sifive_u) but not in 32 bit mode (sifive_e)

Were you able to resolve this @vsiles? Was it your code at fault, or is there an issue with sw ra? I'm hitting the same problem with rv32 code built from Rust in qemu-system-riscv32 when using sifive_e. The following code, at the end of a Rust function, loads 0 into ra and then jumps to it, crashing the emulator:

0x204006a8:  lw              ra,124(sp)
0x204006ac:  addi            sp,sp,128
0x204006b0:  ret

Which shouldn't be happening. I'd like to know if this is something I need to fix my end, or in qemu. Thanks!

vsiles commented 5 years ago

I didn't try to resolve it, since we were only working with 64 bit archs. In such case, gdb display is funky (it might skip steps) but the behavior was correct. I did not look into the 32 bit case more in depth, sorry.

diodesign commented 5 years ago

Thanks for the reply. Something funky is going on. The code enters this function:

IN: _ZN4core3fmt5write17h728046e032081804E
0x20400378:  addi            sp,sp,-128
0x2040037c:  sw              ra,124(sp)

At 0x20400378, ra = 0x20400080 and sp = 0x8001ffd0, according to Qemu's monitor, and both valid. sp is not touched again until we reach...

0x204006a8:  lw              ra,124(sp)
0x204006ac:  addi            sp,sp,128
0x204006b0:  ret             

So ra should be restored correctly to 0x20400080 by the lw. Instead, it's set to 0, causing a crash. This is using QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3) running:

qemu-system-riscv32 -machine sifive_e -kernel target/riscv32imac-unknown-none-elf/release/kernel -nographic -d in_asm,cpu

I'll build from the latest source, but I can't see anything that's changed that would fix this. Is this something I could help with or patch if there's a problem with rv32 sw ra @michaeljclark please?

Edit: It's still overwriting ra with 0 using the latest riscv-all branch (git log -1 --> commit 6e532472173162d479cdfd5f74d2db9f327967ca)

michaeljclark commented 5 years ago

I am wondering if it might be decompression related thus in target/riscv/translate.c. I couldn't see from your objdump output whether you had RVC or not.

BTW The disassembler is distinct from the decoder in target/riscv/translate.c so if you run with -d in_asm,op,op_ind,op_opt,out_asm then you might see if something is mis-translated.

I have not encountered any issues like this on riscv32 yet so I'm curious what is going on... I'm surprised that riscv32 linux boots oif we can't sw ra

I can try your reproducer. Do you have the linker script?

vsiles commented 5 years ago

Sorry, I no longer have access to these files (former job)... But I didn't have anything clever in it anyway

jim-wilson commented 5 years ago

Perhaps there is a buffer overflow that is clobbering values on the stack. Since the problem only happens for 32-bit code, maybe the buffer overflow doesn't happen for 64-bit code. This could be a compiler or library bug where the size of something is computed wrong when you have a 32-bit target.

diodesign commented 5 years ago

OK. It's my fault. I could have sworn there was 128KB of DRAM in the E300 series. It's 16KB. That'll be why ra wasn't written to memory: sp was set too high in memory, in non-existent DRAM. I'm so sorry. I suspect @vsiles was also hitting this same issue.

(PS: thanks for the port, it's appreciated!)

vsiles commented 5 years ago

Might have been that indeed !