rr-debugger / rr

Record and Replay Framework
http://rr-project.org/
Other
9.09k stars 579 forks source link

risc-v support #3796

Open lu-zero opened 3 weeks ago

lu-zero commented 3 weeks ago

Since there isn't an issue already open, would be possible to list which would be the requirements to have rr working on risc-v?

rocallahan commented 3 weeks ago

The most important requirement is that the CPU needs a hardware performance counter that counts the number of userspace retired instructions, branches, or conditional branches with perfect accuracy (any one of those counter types will do, in that order of preference). I know these counters exist on many RISC-V implementations, but I don't know if they're accurate enough. For example on other CPUs it is common for these counters to be spuriously incremented (e.g. sometimes an interrupt will increment the count of retired branches, or sometimes a speculatively executed instruction that does not retire is mistakenly counted as retired). This would need to be carefully tested and possibly bugs fixed in the CPU implementation.

Another requirement is the ability to interrupt the CPU when a hardware performance counter reaches a certain value and have the interrupt be delivered within a bounded number of instructions after the counter reached the value. This bound can be large though (e.g. 5000 instructions is OK).

Naturally we'll need these to be hooked up to the Linux perf-event API. See https://github.com/rr-debugger/rr/blob/master/src/counters-test/README.md for some code that can be adapted to test these things.

Another requirement is that we need to be able to trap on reads from the clock cycle counter, and we'll need Linux kernel API for this like PR_SET_TSC does for x86.

rocallahan commented 3 weeks ago

If these basics are present then it can probably be made to work, but the implementation will still be quite a lot of difficult work.

lu-zero commented 3 weeks ago

Thank you!

rocallahan commented 3 weeks ago

One other thing that @khuey reminded me of: RISC-V depends on LL/SC instructions for atomic compare and swap. That's a problem for rr. The ideal solution here would be for the CPU to be configurable to trap on a failed SC, with Linux kernel API for this.

rocallahan commented 3 weeks ago

This same issue stopped rr from being ported to 32-bit ARM. In Aarch64 LL/SC is deprecated in favour of LSE atomics, so userspace compiled to use only LSE works fine with rr. It appears RISC-V has most atomic ops but CAS still requires an LL/SC pair.

lu-zero commented 3 weeks ago

That means RVA23U64+Zacas would be needed, I guess.

rocallahan commented 3 weeks ago

Yes, if all userspace is compiled to use only the atomic instructions and no LL/SC.

Keno commented 3 weeks ago

If you have more control over the CPU architecture there's other options. For example, when looking at ppc64, we considered the possibility of taking a synchronous trap on SC abort or rewriting the SC instruction to a regular store in microcode.