Open Frank-Zeyda opened 3 months ago
Thanks for debugging this! I think your diagnosis is correct. Can you open a PR with the fix in testchipip?
Thanks Jerry for taking care of this.
There are potentially two ways to fix the issue:
*in_valid = tsi->in_valid();
*in_bits = *in_valid ? tsi->in_bits() : 0;
*out_ready = tsi->out_ready();
(as noted above) or else
*in_valid = tsi->in_valid();
if (*in_valid) {
*in_bits = tsi->in_bits();
}
*out_ready = tsi->out_ready();
A key assumption for both fixes is that the underlying queue in tsi
does not change while the three statements execute. Otherwise, we may end up with a race condition and need to wrap the code into a mutex.
Which of the two fixes is more appropriate depends on the assumed specification of the int tsi_tick(...)
function in SimTSI.cc
. I'd gladly leave those details to you, and thanks once again or the quick reply! 👍
I previously assumed that all the tick
functions would be called from the same thread, which is true for all simulation environments EXCEPT verilator with multithreading, as #1885 is uncovering.
However, I believe inheritors of htif_t
(including tsi_t
) are generally not expected to be thread-safe, so the first fix should be ok.
I have not tried if this is an issue under Ubuntu/Linux. My suspicion is that there, the program would just continue with some arbitrary value for
*in_bits
.
The same issue triggers a failed assertion on Gentoo Linux (glibc 2.40, verilator 5.024).
/usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14/bits/stl_deque.h:1446: std::deque<_Tp, _Alloc>::reference std::deque<_Tp, _Alloc>::front() [with _Tp = unsigned int; _Alloc = std::allocator<unsigned int>; reference = unsigned int&]: Assertion '!this->empty()' failed.
Thread 1 "simulator-chipy" received signal SIGABRT, Aborted.
0x00007ffff70bd8cc in ?? () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff70bd8cc in ?? () from /lib64/libc.so.6
#1 0x00007ffff70685a6 in raise () from /lib64/libc.so.6
#2 0x00007ffff70508fa in abort () from /lib64/libc.so.6
#3 0x00007ffff72dac1f in std::__glibcxx_assert_fail(char const*, int, char const*, char const*) () from /usr/lib/gcc/x86_64-pc-linux-gnu/14/libstdc++.so.6
#4 0x00005555556229c5 in tsi_tick ()
#5 0x0000555555b13629 in VTestDriver___024unit____Vdpiimwrap_tsi_tick_TOP____024unit(unsigned int, unsigned char, unsigned char&, unsigned int, unsigned char&, unsigned char, unsigned int&, unsigned int&) ()
#6 0x0000555555a1ff72 in VTestDriver___024root___nba_sequent__TOP__4680(VTestDriver___024root*) ()
#7 0x0000555555672985 in VTestDriver___024root___eval_nba__11(VTestDriver___024root*) ()
#8 0x000055555565f6b1 in VTestDriver___024root___eval_nba(VTestDriver___024root*) ()
#9 0x0000555555672f12 in VTestDriver___024root___eval(VTestDriver___024root*) ()
#10 0x000055555565f50a in VTestDriver::eval_step() ()
#11 0x000055555561fcb8 in main ()
Background Work
Chipyard Version and Hash
Main/HEAD Commit: 3a6677bc3012f47e94b8914ebab6e789c9e949e3
OS Setup
MacBook Air running Sonoma 14.5 (Apple M2).
Output of
clang --version
is:Apple clang version 15.0.0 (clang-1500.3.9.4) Target: arm64-apple-darwin23.5.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
(I am using clang for compilation, not any version of gcc.)
All software is up-to-date as of 27 May 2024. The RISC-V toolchain and tools have been installed from Homebrew:
I am using
Verilator 5.024 2024-04-05 rev UNKNOWN.REV
albeit with a slight hack to get around Issue 5031 - yes, it is currently troublesome to get verilator working under the latest Sonoma and Clang updates on MacOS 😒.Other Setup
Step 0: Install RISC-V tools via Homebrew (see OS Setup section). Also, install gnu-sed, jq and help2man. Possible other packages that escape me now ...
Step 1: Setting up environment variables. I am using my own handwritten script here due to installing RISC-V tools and toolchain from Homebrew and using a custom-compiled verilator.
Step 2: Patch
common-sim-flags.mk
by removing-std=c++17
fromSIM_CXXFLAGS
. Long story short, we need to compile verilator code with-std=c++20
in order to circumvent Issue 5031 . My patched verilator does that, but this is for a different issue.Step 3: Enter
sims/verilator
and runmake
.Step 4: Run the verilator simulation executable. I.e.,
assuming that rv64ui-p-simple (from riscv-tests is located in the current directory.
Current Behavior
On my system, this results in an immediate Segmentation Fault.
Attached is a relevant extract from the MacOS Diagnostic report.
Expected Behavior
Perform simulation and do not crash.
Other Information
I have already debugged and somewhat fixed the issue. The culprit, as can be seen from the stack trace, is
The code:
(see https://github.com/ucb-bar/testchipip/blob/104df6a81fd989cd4cad69b699894664fcf93c05/src/main/resources/testchipip/csrc/SimTSI.cc#L59-L61) is unsafe.
tsi->in_bits();
implicitly calls thestd::queue::front()
method/function whose behavior is undefined if the queue is empty. Hence, replace the above, for instance, byto avoid the crash. (The above code is not thread-safe though, perhaps it does not need to be ...)
Note: I have not tried if this is an issue under Ubuntu/Linux. My suspicion is that there, the program would just continue with some arbitrary value for
*in_bits
. Semantically, anything can happen once we enter undefined behavior, so the above fix is advised either way!PS: I apologize for just realizing that this bug report may have been better submitted in the testchipip repository.