Closed Dolu1990 closed 2 years ago
Let's say you run one abstract command which return/write data0. Can this new value be usable as a argument of the next abstract command ?
That is certainly what I had imagined, but I cannot think of a good use case for it. (Copy value from one register to another, I suppose, but that's hardly common.)
I cannot imagine what optimization you can make if this isn't the case. How does it help you be require the debugger to always write data0 for an abstract command which will use it?
Let's say you run one abstract command which return/write data0. Can this new value be usable as a argument of the next abstract command ?
Imagine that the answer is no and that data0 is allowed to spontaneously change. At what point is it allowed to spontaneously change?
Take this normal sequence:
If data0 changes between 1 and 2 then it defeats the purpose of the abstract read command. If it changes between 3 and 4 then it defeats the purpose of the abstract write command. I don't see much value in trying to track if we're in between steps 2 and 3 just so data0 can spontaneously change. That would also disallow other sequences such as Tim's copy example where there is no step 3.
First, sorry for the long post XD
@pdonahue-ventana
Imagine that the answer is no and that data0 is allowed to spontaneously change.
There may be a missunderstanding, i didn't mean that it could spontaneously change, i mean : Should the returned value of a abstract command in dataX be reusable for the next abstract command.
In practice, is the following sequance ok or not :
@pdonahue-ventana + @timsifive
Mostly, from a more abstract perspective, I think it is about "Do we consider dataX to be registers/memory for abstract commands" or "Do we consider dataX as data channels for abstract commands"
If we consider dataX as data channel, it means that they are used by the debuger as a way to write abstract command arguements, and read abstract commands returned values.
How does it help you be require the debugger to always write data0 for an abstract command which will use it?
I didn't mean that data0 is lost after each command which used it, i mean that the data0 returned by a abstract command can't be used as arguement of the nexts abstract commands.
I cannot imagine what optimization you can make if this isn't the case.
I think a drawing is better than 1000 words ^^
To give some context, i'm targeting a FPGA softcore (https://github.com/SpinalHDL/NaxRiscv) So the tradeoff compaired to a ASIC can be quite different. Here logic can be "very expensive".
Basicaly, i'm trying to keep the DM and the CPU as much decoupled as possible, so having the dataX storage done where they are readed.
To sum up, do we realy want "coherency A" and "coherency B" link as described in the following diagram :
So, in short :
Thanks. I hadn't considered FPGAs where single port RAM can be much cheaper than a proper register, and the 32-bit wide mux that would be required to support might be expensive.
There's another issue in this design. DMI writes to data0 followed by DMI reads from data0 won't return the value that was just written. Like having abstract commands use the dataX results written by the previous command, this is not something I'd really expect a debugger to do.
It wouldn't be terrible to allow such implementations, but it does make the spec more complicated (what do you call a "register" that acts this way) and it's also not backwards compatible. Might there be abstract commands in the future where this behavior would be a problem? How helpful is this optimization really? Do you have a rough estimate of FPGA resources used by a debug implementation that does/does not use this optimization?
Thanks ^^ @timsifive
There's another issue in this design. DMI writes to data0 followed by DMI reads from data0 won't return the value that was just written.
Yes right, it is "solved" with the "coherency B" link in the second diagram (i think). It could be named an "issue", but from some perspetive it could be named a "feature", allowing eventualy to save hardware space.
what do you call a "register" that acts this way
Right, it would not be called a register (assuming register is synonym of flip flop), but instead it would more be specified as a interface to :
Where abstract command response may override some the written arguements (implementation specific)
Might there be abstract commands in the future where this behavior would be a problem?
Hard to tell for me, started reading the spec only a few days ago ^^
How helpful is this optimization really? Do you have a rough estimate of FPGA resources used by a debug implementation that does/does not use this optimization?
Two benefits : 1) For multiple harts configs interfaced from a single DM, it could allow a better decoupling between the DM and the harts, as it remove the notion of centralized memory ( in the current spec, both the harts and the DM have read / write access to that memory ). 2) Area in FPGA softcore. So, if we resume this issue to "Should you implement coherency A and B" Then it is around 80 LUT ( Cyclone IV / ECP5 FPGA)
(see https://github.com/SpinalHDL/VexRiscv#area-usage-and-maximal-frequency for some softcore size)
So, 80 luts in itself may be a bearable cost for a linux capable softcore, but it is noticable for small baremetal ones. Also, if many bearable cost accumulates, it endup producing oversized designs.
Here is an example of an additional element of the spec which may add hardware for little value (as far as i understand it): progbuf0-15 are read/write registers, which in practice (i guess), serve no real purpose ?
So, 80 luts in itself may be a bearable cost for a linux capable softcore, but it is noticable for small baremetal ones. Also, if many bearable cost accumulates, it endup producing oversized designs.
Thanks for digging that up.
Here is an example of an additional element of the spec which may add hardware for little value (as far as i understand it): progbuf0-15 are read/write registers, which in practice (i guess), serve no real purpose ?
Typical implementations only need 2 of them, and that's allowed by the spec. But it's true that there's not much use for these registers being read/write. Is that what you meant?
What FPGA primitives are you using in the end? Doing some quick Googling, it appears that Xilinx Block RAMs all have 2 read/write ports. With that, can't you implement registers in the block RAM as the spec says?
Typical implementations only need 2 of them, and that's allowed by the spec. But it's true that there's not much use for these registers being read/write. Is that what you meant?
Yes :)
What FPGA primitives are you using in the end?
Currently i'm targeting Xilinx Artix 7 FPGA, but in the future, Altera Cyclone V and Lattice ECP 5 too. (All the FPGA with distributed ram)
it appears that Xilinx Block RAMs all have 2 read/write ports. With that, can't you implement registers in the block RAM as the spec says?
Yes, you are right, but for a few reasons i'm trying to avoid it :
Once everything works in simulation, i will give some synthesis run, to see how much ressources are used by the whole debug infrastructure, will post it here, as it can be interresting.
Anyway, thanks for the issue tracking ^^ Have a good weekend !
Got it to work in simulation (GDB >> openocd >> 4 pin jtag from a TCP socket >> verilator testbench >> DTM >> DMI >> DM >> NaxRiscv) Total size of the added hardware = 436 Artix 7 LUT, GDB seems ok, a few things aren't implemented, as reading the progbuff from the DMI and the coherency A/B for dataX (see diagrams in the previous post), as well as most optional features.
To be honnest, i was expecting the whole thing to use more resources ^^
Got openocd/GDB to work for both rv32/rv64. Here is my "final" config :
Thanks for the help ^^
I'd like to keep this open until we figure out for sure that what you did is allowed by the spec. It's a good optimization that I'd like to allow, but that is a non-backwards-compatible change. Hopefully we can discuss it at tomorrow's meeting.
This was addressed by #728 and #731.
The Architecture Review Committee recommends we drop Message Registers from the spec, or do a bunch of extra work . In the interest of finally getting 1.0 ratified, we're dropping Message Registers. Sorry if this leads to problems for your implementation. Current OpenOCD does not assume it can read back any values it wrote to the data argument registers, so it should work well for you.
Thanks for the update :)
Hi,
I'm currently implementing the spec in hardware and there is one behaviour the spec isn't explicit about (or I missed it / implicit).
Let's say you run one abstract command which return/write data0. Can this new value be usable as a argument of the next abstract command ?
Because if it isn't, it allows more flexibility in the hardware implementation.
Location in the spec : https://github.com/riscv/riscv-debug-spec/blob/eeef4d580971d70b86e3fb498a0536654ae90608/xml/dm_registers.xml#L570
Thanks ^^