Closed kenzhang82 closed 7 years ago
I'm going to point you at section 2.8 of the RISC-V user level specification. https://riscv.org/specifications/ :)
Thanks @davidbiancolin, much appreciated! I understand why the emulator is slow, but is there anyway to know how many cycles each instruction (for example, add, sub, mul etc) takes to be executed. I tried to turn the verbose mode ON, but it doesn't have the cycle count? Or did I do anything wrong?
Also, I just figured out that we could use spike pk -s to do the same thing (understood that spike is just a functional simulator), but what would be the best (and accurate) way to profile the cycle count of a C program running on rocket chip zynq infrastructure? Thanks.
Hi @z419379295 - you're asking for a metric that's fundamentally ambiguous, because pipelined processors overlap latencies. Suppose MUL has 3-cycle latency and LW has 2-cycle latency:
MUL x1, x1, x2
LW x2, 0(x2)
ADD x2, x2, x1
A single-issue in-order pipeline would incur one stall cycle before the ADD, so the sequence completes over the course of four cycles. But since the ADD is stalled on both the MUL and the LW, how do you decide how to apportion those cycles between the instructions?
Thanks @aswaterman for your help!! Aha, that makes sense to me now. Maybe I was not able to see the big picture, maybe what I was trying to do was to identify the power consumption of a C code that is being executed in rocket chip synthesized in Zynq PL, I thought it might be good to see which instruction takes up the most cycle? Or is there any way to achieve this (i.e. power profiling of instructions)? Thanks.
I'm not really sure... maybe run several benchmarks, measure their power consumption and instruction mix, and then attempt to correlate power consumption with instruction mix?
The benchmark? You mean running on cycle-accurate C++ emulator? How do we measure the power consumption of a software algorithm running on RISC-V processor?
You would need some sort of RTL-based power model. The details are something of an open research question, so there's not going to be a push-button answer here.
Cool, thanks!
Hi there,
Maybe I post this in the wrong forum, but I have searched many places and couldn't find answer to it.
How could we count how many cycles each instruction takes to be executed for a C program running on top of rocket chip FPGA infrastructure (say default config, i.e. pre-built image)?
The closest answer I could find is to compile the C++ cycle emulator and to simulate it, but even a simple "hello world" C program takes long time to be simulated.
Any help would be much appreciated! Thanks.
Ken