ucb-bar / chipyard

An Agile RISC-V SoC Design Framework with in-order cores, out-of-order cores, accelerators, and more
https://chipyard.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
1.57k stars 619 forks source link

Verilator simulation (RocketConfig default example) crashes on MacOS #1886

Open Frank-Zeyda opened 3 months ago

Frank-Zeyda commented 3 months ago

Background Work

Chipyard Version and Hash

Main/HEAD Commit: 3a6677bc3012f47e94b8914ebab6e789c9e949e3

OS Setup

MacBook Air running Sonoma 14.5 (Apple M2).

Output of clang --version is:

Apple clang version 15.0.0 (clang-1500.3.9.4) Target: arm64-apple-darwin23.5.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

(I am using clang for compilation, not any version of gcc.)

All software is up-to-date as of 27 May 2024. The RISC-V toolchain and tools have been installed from Homebrew:

==> Formulae
riscv-software-src/riscv/riscv-gnu-toolchain ✔
riscv-software-src/riscv/riscv-isa-sim ✔
riscv-software-src/riscv/riscv-openocd ✔
riscv-software-src/riscv/riscv-pk ✔
riscv-software-src/riscv/riscv-tools ✔

I am using Verilator 5.024 2024-04-05 rev UNKNOWN.REV albeit with a slight hack to get around Issue 5031 - yes, it is currently troublesome to get verilator working under the latest Sonoma and Clang updates on MacOS 😒.

Other Setup

Step 0: Install RISC-V tools via Homebrew (see OS Setup section). Also, install gnu-sed, jq and help2man. Possible other packages that escape me now ...

Step 1: Setting up environment variables. I am using my own handwritten script here due to installing RISC-V tools and toolchain from Homebrew and using a custom-compiled verilator.

# Avoids issues due to different options for sed on MacOS.
# Requires gnu-sed to be installed via homebrew.
export PATH="/opt/homebrew/opt/gnu-sed/libexec/gnubin:$PATH"

# Use RISC-V toolchain installed via homebrew.
# Requires riscv-gnu-toolchain to be installed via homebrew.
export RISCV="/opt/homebrew/opt/riscv-gnu-toolchain/bin"

# Sets additional include paths
export CPPFLAGS="-I/opt/homebrew/opt/riscv-isa-sim/include"

# Need to link verilator simulation executable.
export LDFLAGS="-L/opt/homebrew/opt/riscv-isa-sim/lib"

Step 2: Patch common-sim-flags.mk by removing -std=c++17 from SIM_CXXFLAGS. Long story short, we need to compile verilator code with -std=c++20 in order to circumvent Issue 5031 . My patched verilator does that, but this is for a different issue.

Step 3: Enter sims/verilator and run make.

cd sims/verilators
make

Step 4: Run the verilator simulation executable. I.e.,

./simulator-chipyard.harness-RocketConfig rv64ui-p-simple`

assuming that rv64ui-p-simple (from riscv-tests is located in the current directory.

Current Behavior

On my system, this results in an immediate Segmentation Fault.

./simulator-chipyard.harness-RocketConfig $RISCV_TESTS/rv64ui-p-simple
[UART] UART0 is here (stdin/stdout).
zsh: segmentation fault  ./simulator-chipyard.harness-RocketConfig $RISCV_TESTS/rv64ui-p-simple

Attached is a relevant extract from the MacOS Diagnostic report.

image

Expected Behavior

Perform simulation and do not crash.

Other Information

I have already debugged and somewhat fixed the issue. The culprit, as can be seen from the stack trace, is

generators/src/main/resources/testchipip/csrc/SimTSI.cc

The code:

  *in_valid = tsi->in_valid();
  *in_bits = tsi->in_bits();
  *out_ready = tsi->out_ready();

(see https://github.com/ucb-bar/testchipip/blob/104df6a81fd989cd4cad69b699894664fcf93c05/src/main/resources/testchipip/csrc/SimTSI.cc#L59-L61) is unsafe. tsi->in_bits(); implicitly calls the std::queue::front() method/function whose behavior is undefined if the queue is empty. Hence, replace the above, for instance, by

  *in_valid = tsi->in_valid();
  *in_bits = *in_valid ? tsi->in_bits() : 0;
  *out_ready = tsi->out_ready();

to avoid the crash. (The above code is not thread-safe though, perhaps it does not need to be ...)

Note: I have not tried if this is an issue under Ubuntu/Linux. My suspicion is that there, the program would just continue with some arbitrary value for *in_bits. Semantically, anything can happen once we enter undefined behavior, so the above fix is advised either way!

PS: I apologize for just realizing that this bug report may have been better submitted in the testchipip repository.

jerryz123 commented 3 months ago

Thanks for debugging this! I think your diagnosis is correct. Can you open a PR with the fix in testchipip?

Frank-Zeyda commented 3 months ago

Thanks Jerry for taking care of this.

There are potentially two ways to fix the issue:

  *in_valid = tsi->in_valid();
  *in_bits = *in_valid ? tsi->in_bits() : 0;
  *out_ready = tsi->out_ready();

(as noted above) or else

  *in_valid = tsi->in_valid();
  if (*in_valid) {
    *in_bits = tsi->in_bits();
  }
  *out_ready = tsi->out_ready();

A key assumption for both fixes is that the underlying queue in tsi does not change while the three statements execute. Otherwise, we may end up with a race condition and need to wrap the code into a mutex.

Which of the two fixes is more appropriate depends on the assumed specification of the int tsi_tick(...) function in SimTSI.cc. I'd gladly leave those details to you, and thanks once again or the quick reply! 👍

jerryz123 commented 3 months ago

I previously assumed that all the tick functions would be called from the same thread, which is true for all simulation environments EXCEPT verilator with multithreading, as #1885 is uncovering.

However, I believe inheritors of htif_t (including tsi_t) are generally not expected to be thread-safe, so the first fix should be ok.

a0u commented 1 month ago

I have not tried if this is an issue under Ubuntu/Linux. My suspicion is that there, the program would just continue with some arbitrary value for *in_bits.

The same issue triggers a failed assertion on Gentoo Linux (glibc 2.40, verilator 5.024).

/usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14/bits/stl_deque.h:1446: std::deque<_Tp, _Alloc>::reference std::deque<_Tp, _Alloc>::front() [with _Tp = unsigned int; _Alloc = std::allocator<unsigned int>; reference = unsigned int&]: Assertion '!this->empty()' failed.

Thread 1 "simulator-chipy" received signal SIGABRT, Aborted.
0x00007ffff70bd8cc in ?? () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff70bd8cc in ?? () from /lib64/libc.so.6
#1  0x00007ffff70685a6 in raise () from /lib64/libc.so.6
#2  0x00007ffff70508fa in abort () from /lib64/libc.so.6
#3  0x00007ffff72dac1f in std::__glibcxx_assert_fail(char const*, int, char const*, char const*) () from /usr/lib/gcc/x86_64-pc-linux-gnu/14/libstdc++.so.6
#4  0x00005555556229c5 in tsi_tick ()
#5  0x0000555555b13629 in VTestDriver___024unit____Vdpiimwrap_tsi_tick_TOP____024unit(unsigned int, unsigned char, unsigned char&, unsigned int, unsigned char&, unsigned char, unsigned int&, unsigned int&) ()
#6  0x0000555555a1ff72 in VTestDriver___024root___nba_sequent__TOP__4680(VTestDriver___024root*) ()
#7  0x0000555555672985 in VTestDriver___024root___eval_nba__11(VTestDriver___024root*) ()
#8  0x000055555565f6b1 in VTestDriver___024root___eval_nba(VTestDriver___024root*) ()
#9  0x0000555555672f12 in VTestDriver___024root___eval(VTestDriver___024root*) ()
#10 0x000055555565f50a in VTestDriver::eval_step() ()
#11 0x000055555561fcb8 in main ()