Using spike in a lock-step hardware simulation

Dolu1990 commented 2 years ago

Hi,

I recently developped a RISC-V OoO core and used Spike as a reference model in the verilator simulation of the hardware core. Got the simulation to keep spike and the core in exact sync while running linux/buildroot.

When i mean using spike in a lock-step manner to check the hardware core, i mean :

Step spike by one instruction each time the hardware core commit one instruction
For interrupts, holding spike from taking any until the hardware core take one (to keep them in sync)
Syncronising a few specific CSR values from the hardware core to spike ones, as some have implementation specific behaviour (ex : mcycle)
Comparing the register file writes between the two
Comparing the PC between the two

So, most of things required to setup such lock-step are already supported in Spike excepted a few where i had to patch things as bit :

interrupt holding
step spike by one commit (the actual step function may not necessarly commit anything if there is a trap)
knowing the last commited pc
Optionaly implement WFI as a nop
csr syncronisation (not a big deal)

So my questions are :

Is it wishable in the future to support such lock-step usage in spike's upstream ?
Did i miss some feature in spike already covering the above list ?

For reference, here are my dirty changes in spike : https://github.com/SpinalHDL/riscv-isa-sim/commits/master

Here is the verilator testbench's functions related to the spike integration : https://github.com/SpinalHDL/NaxRiscv/blob/703bb86106d0db87b9844d28eded3b3608e59a58/src/test/cpp/naxriscv/src/main.cpp#L1399 https://github.com/SpinalHDL/NaxRiscv/blob/703bb86106d0db87b9844d28eded3b3608e59a58/src/test/cpp/naxriscv/src/main.cpp#L1459

Thanks ^^ Charles

abhinay-kayastha commented 2 years ago

I think like for a basic core designs, the requirements listed may be sufficient. However, the ROB implementation that is done will work only for OoO core, not the in-order cores which will have delayed writes. Also another big thing missing is vector support which has more that one corner cases to handle.

To make a fully supported lock-step usage, few other things I think we have to handle is standard way to provide different address space configurations, isa-aarch especially for multicore sim, updating some of the API to handle different PA width, CSR implementation including the custom one, ability to handle multiple allowed implementation based on the RISC-V spec and in-turn ability to get and overwrite some of the CSR registers in valid and legal scenarios, ability to support custom interrupt controllers and custom instructions. These features to enable lock-step mechanism to its full potential is still a viable option to add in spike's upstream in future.

The scoreboard on the other hand is too difficult to standardize, so in my opinion we should not even attempt in pushing that to spike upstream. The scoreboard logic is closely tied to monitor log of the core designed and I don't think its easy to make it standard.

Dolu1990 commented 2 years ago

Hi @abhinay-kayastha ^^

the ROB implementation that is done will work only for OoO core

Right, this should be keept to the testbench side of things, not included in spike in any way

not the in-order cores which will have delayed writes

Do you mean register file write or memory store ? In my case, i didn't tried to track memory store, as the cache is write back, it would be very tricky i guess.

multicore sim

That would require quite some more complicated testbench setup. In my case, i keept things simple by giving both spike and the rtl their own memory, both initialized the same way.

The scoreboard on the other hand is too difficult to standardize, so in my opinion we should not even attempt in pushing that to spike upstream. The scoreboard logic is closely tied to monitor log of the core designed and I don't think its easy to make it standard.

I agree, i mostly see that issue to discut about adding more probing / whitebox control over the spike execution. Not trying to inject testbench related stuff in spike it self.

I just ran a few more linux sim, and and just hit a tricky case XD :

In the sim, Linux has a page fault, triggering a "map_kernel_range_noflush", and later that address is accessed again (still without any sfence.vma).

Issue is that spike will see the TLB changes, while the hardware will not (as no sfence.vma happend and it cached the old translation), making the two diverging.

scottj97 commented 2 years ago

I use Spike in the same way in my project. Your commits seem simple enough but before we could accept them upstream I have a few questions/issues.

I don't understand the WFI change. Spike already implements WFI as a NOP. The wfi() macro that you removed has to do with switching between multiple harts, IIUC. I don't understand how that would be an issue for your use case.
In my use case I consider a trap to be a step. When I step Spike, if the next instruction takes a trap, I don't want my next comparison point to be after the first instruction of the trap handler. I want to compare all the register updates triggered by the trap (mcause, etc), before any handler instructions are committed. Is your use case different?
I've implemented the interrupt holding by removing the PLIC model from Spike and feeding the externally-driven bits of mip from the RTL to Spike. Your use case seems to be different -- do you still use Spike's PLIC model? Perhaps you get better verification because Spike will only take an interrupt if the Spike PLIC agrees with the hardware that one is due?

Dolu1990 commented 2 years ago

Hi @scottj97

but before we could accept them upstream

Don't upstream them ^^ they where made in some hacky way to get things working, would need a cleaner branch before any sort of merge

In my use case I consider a trap to be a step .. Is your use case different?

Yes right, but i think your aproache is better. In my case, i'm it is only comparring things on commits, threating traps like a eye blink ^^ I will likely update that in the testbench.

I've implemented the interrupt holding by removing the PLIC model from Spike and feeding the externally-driven bits of mip from the RTL to Spike

So, in my case, it interface with the proc_t class. Doesn't use anything else (so no plic model). The testbench handle all peripherals accesses from the CPU in: https://github.com/SpinalHDL/NaxRiscv/blob/2832adfc9c92c631cb87f86bf47f2fc8f30cfd35/src/test/cpp/naxriscv/src/main.cpp#L239 https://github.com/SpinalHDL/NaxRiscv/blob/2832adfc9c92c631cb87f86bf47f2fc8f30cfd35/src/test/cpp/naxriscv/src/main.cpp#L268

Then, when cpu RTL make a interrupt trap, that same interrupt is then "proposed" to spike via mip overriding : https://github.com/SpinalHDL/NaxRiscv/blob/2832adfc9c92c631cb87f86bf47f2fc8f30cfd35/src/test/cpp/naxriscv/src/main.cpp#L1624

Spike will only take an interrupt if the Spike PLIC agrees with the hardware that one is due

It is right that the method i implemented will not catch all bugs and could be better. The worry i had implementing something which would give spike more "autority" on when a interrupt is pending is that the RTL may see a given interrupt as "pending", while the interrupt source just turned it off on the peripheral side (hardware having some pipelining on the interrupt paths). It is very corner cases, but didn't wanted to take risks related to such cases.

Dolu1990 commented 2 years ago

I don't understand the WFI change.Spike already implements WFI as a NOP.

I'm checking now to be sure about what is happening

neelgala commented 2 years ago

I too have recently been using spike as a reference model in a lock-step Cocotb based verification environment for our cores. I have managed to wrap spike so that it can be used in python as an object. The setup is available here those interested. I hope to open source our lock-step verification setup soon in the coming month as well.

The changes we do to spike source are in the spike.patch file, but its mostly to get spike built correctly as a shared object file.

I haven't explored the whole interrupt part yet, but I would agree with scott there. Spike's PLIC should probably be disabled and synchronizing with RTL's mip bits is probably an easier and better approach.

.

Dolu1990 commented 2 years ago

@scottj97

I don't understand the WFI change.Spike already implements WFI as a NOP.

You are right. So issue i had came from the fact that the feature i added in spike to capture the last commited pc was not working with WFI, because WFI does a C++ throw in spike, which was skiping my "capture the last commited pc" spike code.

Anyway, reworking the testbench following your comment allowed me to not having to use that feature anymore (capture the last commited pc). And instead of having to make spike commit one instruction, now the testbench just use native proc.step(1) without workarounds.

I reverted from my spike fork the related changes.

Thanks ^^

@neelgala

The changes we do to spike ... its mostly to get spike built correctly as a shared object file.

That would be great to have something like it as a upstream option ^^

I too have recently been using spike as a reference model in a lock-step

I was two step from implementing my own RISC-V model, as i was realy not sure it would be possible to reuse spike for that usage XD

Maybe adding in the spike readme that it can be used for such usage would help people not "reinventing the wheel". Also, having spike as a reference model / lock-step can realy save the day. Debuging a linux crash, 400'000'000 cycles down the road in a wave, without lock-step, is one step toward insanity ^^

scottj97 commented 2 years ago

@Dolu1990 for links to code you should use permalinks, instead of links to main branch, since that will change over time and the links will become obsolete.

Note that after #922 you will want to change this and this.

Dolu1990 commented 2 years ago

@scottj97

for links to code you should use permalinks,

Right, i'm often forgeting it ^^ (updated)

Note that after https://github.com/riscv-software-src/riscv-isa-sim/pull/922 you will want to change this and this.

Thanks :D

GregAC commented 2 years ago

Those interested in this topic may wish to look at what we've done on Ibex (https://github.com/lowRISC/ibex). We run spike in lockstep with our CPU simulation (with support for Verilator, VCS and Xcelium) as our primary checking method. We check memory accesses along with register writes (easy in our case as it's an in-order CPU with no speculation so data side accesses must precisely match accesses made in Spike). Our approach is documented here: https://ibex-core.readthedocs.io/en/latest/03_reference/cosim.html

We have had to made a few modifications to spike itself but nothing major @rswarbrick has been working on up-streaming our changes so we eventually don't need a custom fork.

Some documentation explaining how spike can be used for lockstep simulation, once the necessary changes are in, sounds like a good idea to me.

Dolu1990 commented 2 years ago

@GregAC Thanks ^^

Dolu1990 commented 2 years ago

Found the following :

""" Rivos is releasing the source for hammer - a library that instantiates Spike and can be used to run Spike in lock-step and allows Spike state to be read and written. This is useful to run Spike in co-simulation with another model for verification.

The source is available here: https://github.com/rivosinc/hammer """

https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/HP0cj7dNXv0/m/7-f7jTAKAwAJ?utm_medium=email&utm_source=footer

Dolu1990 commented 1 year ago

Got a RISC-V trace checker in its own repo :

https://github.com/SpinalHDL/rvls
Support multicore trace checking (including some family of memory coherency)
Can check execution trace via either a human readable text file (ASCII frontend)
Can also be used via a JNI frontend (direct integration into a SpinalHDL / Chisel simlation) or in a C/C++ environnement

dpetrisko commented 10 months ago

Hi, is there any update on upstreaming the lockstep simulation? If maintainers would help define a cosim step / trap interface similar to https://github.com/chipsalliance/dromajo/blob/master/src/dromajo_cosim.cpp, I'd be happy to help push it forward. There are several implementations floating around so I'd prefer not to add yet another :)

jerryz123 commented 10 months ago

The existing APIs in libriscv should be sufficient for a cosim step interface.

This one https://github.com/ucb-bar/testchipip/blob/master/src/main/resources/testchipip/csrc/cospike_impl.cc demonstrates a dromajo-like interface.

I'm not sure what additional generic APIs could be added to simplify using spike as tandem-verification. I feel that these setups are inevitabiy somewhat bespoke.

dpetrisko commented 10 months ago

Thanks, this is what I was looking for! Will see if anything stands out as generic enough. I like the scheme here for DUT overrides, which is the least generic part...

Perhaps simply exporting a spike_cosim header and library with the minimal cosim step and override interface would help to clear up confusion

riscv-software-src / riscv-isa-sim

Using spike in a lock-step hardware simulation #911