litex-hub / linux-on-litex-vexriscv

Linux on LiteX-VexRiscv
BSD 2-Clause "Simplified" License
551 stars 174 forks source link

Read Performance Counters #302

Closed amr-25 closed 1 year ago

amr-25 commented 1 year ago

I'm trying to measure the clock cycles through counters present. But I cannot use rdcycles or mcycle due to the presence of sbi supervisor. I get sbi_trap_error, mcause=2 if I try to use asm volatile ("csrr %0, mcycle" : "=r" (b));. Can I get some ideas on how to measure performance? Thank you

Dolu1990 commented 1 year ago

Hi,

In which context are you reading those, in linux user mode ?

In user mode you can use : ucycle instead of mcycle. That one should be good.

Dolu1990 commented 1 year ago

Hoooo, just looking right now, ucycle is turned off to save area XD

See ucycleAccess = CsrAccess.NONE, in : https://github.com/SpinalHDL/VexRiscv/blob/24795ef09b88defe2ee1bb335e5caaf7e07e64ff/src/main/scala/vexriscv/plugin/CsrPlugin.scala#L110

Should have been at least READ_ONLY

So, to turn it one, you can go in the litex pythondata-cpu-vexriscv_smp go the the inner VexRiscv repo (pythondata_cpu_vexriscv_smp/verilog/ext/VexRiscv), patch it there, and delete the pregenerated netlist in pythondata_cpu_vexriscv_smp/verilog.

You will neet to have SBT installed

amr-25 commented 1 year ago

My objective is to measure cycles spent on a benchmark when configured with one core, and two core.

Dolu1990 commented 1 year ago

So with the pythondata-cpu-vexriscv_smp modification i posted just above that should be good.

But overall, if the runtime of the benchmark is long enough, i guess the overhead of using the standard c library time is fine.

amr-25 commented 1 year ago

Alright, the following modification will change the hardware aspect but does it not require changes in sbi side? Re-generating the opensbi.bin?

Dolu1990 commented 1 year ago

It will only change hardware. Software can stay the same.

amr-25 commented 1 year ago

I deleted .v files inside pythondata_cpu_vexriscv_smp/verilog. But it is asking for one of the files while I make the smp. The error is ERROR: [Common 17-69] Command failed: File '/home/user/Litex/pythondata-cpu-vexriscv-smp/pythondata_cpu_vexriscv_smp/verilog/Ram_1w_1rs_Generic.v' does not exist

Dolu1990 commented 1 year ago

Hoo my bad, do not delete the Ram*.v verilog, i forgot about those

amr-25 commented 1 year ago

Thank you. It worked :) Similarly, are there other counters(instructions retired, etc) that can be enabled to get more data on performance?

Dolu1990 commented 1 year ago

Hooo nice :D

Sure, there is uinstretAccess for the instruction retired that you can enable. Eventualy there is also utimeAccess but it is quite similar to ucycle, so not realy usefull.

amr-25 commented 1 year ago

I found this project https://github.com/firesim/firesim/blob/main/sim/firesim-lib/src/main/scala/bridges/TracerVBridge.scala which lets to profile and access the performance by tracing. Are there plans to add something similar to evaluate performance other than the 3 counters(rdcycle,time and instret)? (Similarly in riscy cores https://github.com/hchsiao/riscv/blob/master/riscv_tracer.sv) It would be nice to have it with Vexriscv as well. Thank you

Dolu1990 commented 1 year ago

Hi,

So tracing things in simulation ? or tracing things on real hardware ? You want to trace the flow of instruction and events in the pipeline ?

(currently there is no plan)

amr-25 commented 1 year ago

Yes, tracing in cycle accurate simulation or hardware. I guess, verilator can do the tracing in simulation? Not sure. My thoughts are like this, something like a plugin which is triggered by an instruction(similar to rdcycle, for instance, startrec), to start recording the instructions retiring and stop recording through another instruction.

Dolu1990 commented 1 year ago

something like a plugin which is triggered by an instruction(similar to rdcycle, for instance, startrec), to start recording the instructions retiring and stop recording through another instruction.

Ahh this sound like the RISC-V privileged performance counters. That could do it, (not implemented yet)

amr-25 commented 1 year ago

So, creation of such plugin is feasible through modifying the customcsrplugin? If yes, I can look into it. Not sure if it can be realised with verilator itself, aswell.

Dolu1990 commented 1 year ago

@amr-25 Yes, https://github.com/SpinalHDL/VexRiscv/blob/051d140c33ce1480e10bdf76668fceae8ff59bef/src/main/scala/vexriscv/demo/CustomCsrDemoPlugin.scala#L11 is not very far from it ^^

The RISC-V feature is specified in https://github.com/riscv/riscv-isa-manual/releases/download/Priv-v1.12/riscv-privileged-20211203.pdf

in 3.1.10 Hardware Performance Monitor

They are the hpmcounters and related hardware. With the hpmeventX registers you can specify which kind of hardware event make the hpmcounterX count up, ex : instruction retire, cache miss and so on.

Not sure if it can be realised with verilator itself, aswell.

It isn't realy related to verilator, the idea is to be able to access those counter directly from the software running on the CPU itself :)

So could be used in any simulation / hardware

amr-25 commented 1 year ago

Right. Sorry if I have misunderstood. But what if we simulate the CPU in verilator with TRACE on. And extract the information(ex:instruction retired) about the instructions from the .fst that is generated. I was going through https://tomverbeure.github.io/2022/02/20/GDBWave-Post-Simulation-RISCV-SW-Debugging.html where the two signals i.e lastStagePc[31:0] and lastStageIsValid is used to get the number of instructions retired. Through this way, we need not have a counter that can be accessed by the software but just probe the signals and get the information of interest. Won't both of these ways, i.e having counter and verilator sim lead to same purpose?

Dolu1990 commented 1 year ago

Yes right, that's one way to do it ^^ I would say, the fst trace wave is good, until you need real hardware to interract with real peripherals, or need to run very long stuff like booting linux. But else, yes, i think things are mostly interchangeable. Maybe the only thing you kind of need to analyse things in the FST wave, is a way to know when you want to start count / stop count.