Command structure in driver/software

riscv / riscv-debug-spec

Working Draft of the RISC-V Debug Specification Standard

https://jira.riscv.org/browse/RVG-94

Other

455 stars 92 forks source link

Command structure in driver/software #742

Closed markusdd closed 1 year ago

markusdd commented 2 years ago

Hi, I have been reading this debug spec 1.0.0 now and while I very much agree with the features outlined in section 1.5 and with the multi-hart debug strategy, I have some questions and suggestions about the register and command structure:

1) The register offsets from DM-base increment by one, although they are 32-bit wide. As most systems will use byte-adressing, this does not play well e.g. with an APB- or generally bus-based DM register file. You would expect increments by 4 then.

2) Is it possible to emulate the command structure e.g. in the openocd driver? We are very much deviating away from using JTAG-chains within the SoC because they are all kinds of trouble with regard to power domain switching etc. We much prefer having a central JTAG tap which then translates onto AHB/APB and from ther you can go wherever you want. Another alternative for bus access are SPI interfaces, UARTs and the like. In our debug reg file the core registers are all visible directly by seperate addresses in the APB window without having to first write a command etc. . This is easy, cheap (no extra flops), and writing is just done by writing onto the respective register, which then gets forwarded inside the core directly. So while all this command stuff is nice for being generic, in our case it makes 0 sense. If we wish to be compliant: Can these command be just emulated in the e.g. OpenOCD driver? Same goes for e.g. the resets. The SoC already has provisions for that which make no sense to be replicated in the DM or hart again.

So what I am proposing is e.g. in our case: Spec compliant Debugger sends a read command for register 15. In teh driver this just gets translated into a JTAG2AHB access, reading regsiter 15, and then returning the value together with the ready status bit.

In my view, it would make a lot of sense to allow that.

3) I'm not sure if I missed it, but is there a general provision to advertise the base address in a memory-mapped DM? I know there is register 0x11 to advertise some of the features, but if you e.g. have multiple DMs in different subsystems, how do you advertise their base addresses?

timsifive commented 2 years ago

The register offsets from DM-base increment by one, although they are 32-bit wide. As most systems will use byte-adressing, this does not play well e.g. with an APB- or generally bus-based DM register file. You would expect increments by 4 then.

If you want to use APB as a DMI, then you just wire the two lower address bits to 0, and map the DMI address bits starting from address bit number 2. Perhaps it would have been cleaner to specify 4-byte increments, but that ship has sailed.

Another alternative for bus access are SPI interfaces, UARTs and the like.

I'd love to see extra Debug Transport Modules defined for SPI and UART interfaces.

If we wish to be compliant: Can these command be just emulated in the e.g. OpenOCD driver?

No. The spec defines how hardware should behave. It is not dependent on somebody using a specific debugger like OpenOCD.

is there a general provision to advertise the base address in a memory-mapped DM?

This isn't completely solved. Probably it will involve using nextdm, and likely unified discovery will give us an additional mechanism to enumerate all DMs in a system, when that is ratified.

markusdd commented 2 years ago

Thanks a lot for the quick reaction! Some remarks:

a) I think the spec should be completely transport agnostic. If someone wants to sent debug command via bluetooth, that should also work^^ (exaggerating to make the point). Nothing should be defined for that at all.

b) I do not get how this would work in practice. The hardware will behave just like described, but the way the chip is accessed will always be SoC specific. Even when connecting e.g. to standard ARM cores always e.g. the specific cores need to be given to make clear to the host what can be used and how the DAP works. Also the Interface needs to to be given (usually SWD or JTAG) So essentially there is always a driver involved (it doesn't have to be in OpenOCD, but essentially every MCU on this planet has a specific debugger definition, already because commands to flash Firmware to non-volatile memory etc differ materially)

c) I would strongly suggest to think again about the hardware impact of the current spec. If you are targeting a small 32-bit 32-register core for low power or even a 16-register RVE one, this debug spec introduces a significant overhead without a specific need. An ARM CM0+ has less than 1k Flip-Flops, debug access port included. With this debug spec this would be close to impossible to achieve. So I strongly believe at least the option to transfer some of the logic into debug drivers and use more efficient provisions on-Chip would be advisable. Another way of handling this would be to offer 2 different compliance levels to this debug spec. Level 1 would offer 'feature compliance' (essentially section 1.5), Level 2 'feature compliance and control interface compliance'. Level 1 then simply needs to ship their own drivers to work 'plug'n'play, but at least all debug plugins for editors, IDEs etc would work. Level 1 is essentially how all RISC-V debugging works today because there is no stable spec yet. So just because of that it would probably be good to introduce something like that, because there are commercial products out in the field that could be made part of this standard through a software layer, otherwise they'll eternally remain 'pre-spec', which is not necessary.

d) I might have overlooked that, but I lack some deeper spec on how breakpoints and data breakpoints (also known as Watchpoints) should behave. For breakpoints there is usually a common understanding that the instruction where the breakpoint sits will NOT be executed. For watchpoints there is no real standard. I have seen halts after the data transfer (read or write) has happened, but also halts upon detection that the transfer would be issued now but the system halts before that. Is there any preference?

timsifive commented 2 years ago

a) I think the spec should be completely transport agnostic. If someone wants to sent debug command via bluetooth, that should also work^^ (exaggerating to make the point). Nothing should be defined for that at all.

The spec is designed with this in mind. We can add additional debug transport modules, including one for Bluetooth. I don't believe it's possible to make a spec that is transport-agnostic because you still need to specify how the transport is used. I know nothing about Bluetooth, but e.g. for an RS232 interface you'd have to specify how the address/data are encoded, and how results are encoded.

b) I do not get how this would work in practice. The hardware will behave just like described, but the way the chip is accessed will always be SoC specific.

The way the chip is accessed is not just anything. It is one from a fixed set of options. Currently the only option in the spec is JTAG. cJTAG and SWD both offer standard ways to bridge to JTAG, and I think there are implementations of both as well. But for something like USB there is no standard way, so we'd have to add it to the spec for everybody to work the same.

There are at least 4 debuggers that can connect to RISC-V at this time. That works because the hardware interface is well-defined and no special software is required.

c) I would strongly suggest to think again about the hardware impact of the current spec. If you are targeting a small 32-bit 32-register core for low power or even a 16-register RVE one, this debug spec introduces a significant overhead without a specific need.

We think a lot about the hardware impact of the spec. If you have a trivial core then I think you can just implement the abstract commands which require minimal flip flops. On a core like you described earlier where you have direct access to all registers, you wouldn't even need a state machine to access registers. The same might even be true for memory if all you have is on-chip SRAM.

Level 1 is essentially how all RISC-V debugging works today because there is no stable spec yet. So just because of that it would probably be good to introduce something like that, because there are commercial products out in the field that could be made part of this standard through a software layer, otherwise they'll eternally remain 'pre-spec', which is not necessary.

There is a stable spec. 0.13.2 was ratified quite some time ago. You can see it at https://riscv.org/technical/specifications/ (Having said that, I recommend people implement this 1.0 version instead because it fixes a number of minor issues and is likely to be ratified basically as-is.)

d) I might have overlooked that, but I lack some deeper spec on how breakpoints and data breakpoints (also known as Watchpoints) should behave. For breakpoints there is usually a common understanding that the instruction where the breakpoint sits will NOT be executed. For watchpoints there is no real standard. I have seen halts after the data transfer (read or write) has happened, but also halts upon detection that the transfer would be issued now but the system halts before that. Is there any preference?

See the discussion of the timing bits in mcontrol6 for the details on that.

markusdd commented 2 years ago

Thank you for the elaborate response!

Just as a preface: Our original debug implementation which we also shipped in silicon dates back all the way to early 2018, with implementation in 2017, so even before the preliminary debug spec was ratified. So this is where I am coming from with my comments above.

a) I was just joking about Bluetooth to make point ;)

b) We do use JTAG as well, but apart from having a very basic set of registers and modes, JTAG does not really encompass any additional functionality. We for example do have a JTAG2AHB bridge in our JTAG-TAP controller which gives you access to the bus system, to which our APB Debug register file for the RISC-V is attached. But the way in which this JTAG2AHB bridge is accessed and operated is specific to our implementation. I know ARM has this in their DAPs as well. So if you wish to perform e.g. any memory mapped accesses through the debugger, you need a driver for that specific SoC. I do not think that this can be generalized. The number of different TAP implementations out there is huge, and they are not only driven by RISC-V debug needs but by many other factors on the SoC as well. Having an additional TAP implementation for each and every core seems wasteful.

So if we wish to use APB for our DMI, how are we supossed to:

advertise the baseaddress to the debugger? (I understand this is lacking in the concept from one of your inputs above)
define the access to the bus system (AHB/APB does not go off-chip, so you need some way in. In the past we have used SPI for that, nowadays we use the aforementioned JTAG2AHB-bridge in our main system TAP)
define a possible chip specific auth procedure to unlock access from outside (which might be completely unrelated to RISC-V but driven by the whole SoC requirements)
integrate with the rest of the system that might use other cores as well (I participated in SoCs with no less than 27 cores from 3 different vendors) or has special security requirements, this e.g. ties into the whole reset discussion, where an alternative location for the respective reset controls might be unavoidable (e.g. in a security hardened system you might need to aquire a hardware semaphore to access system controls like this)

To try and make my point a bit more crisp: From the given spec I have a hard time seeing how to comply with that spec without scrapping the whole bus-attached/memory-mapped approach altogether, losing all the benefits of it, and writing a completely new TAP controller just solely for the debug register file, adding significant complexity compared to what we had before, which had already like 80% of the functions from chapter 1.5 .

This might come across as a little bit dumb, but somehow I feel this spec might still lack some provisions to really accomodate the wide range of possible implementations and core integrations out there in heterogenous systems which are not primarily defined by the fact that they contain a RISC-V core.

My biggest confusion is really the access to the DM. Without the registers being located directly in the System TAP, I see no way to advertise how to get to those registers in a byte-addressed, memory-mapped system.

As a last point: I do not get your reference to 'timing discussions in mcontrol6'? What discussion are you referring to? Is there a link you can provide?

Thanks again for taking the time to discuss this. We very much wish to adapt to the specification, but as you see there are still some gaps in understanding I guess^^

bruceable commented 2 years ago

Markus,

I work for SiFive and we have implemented an APB option for the DM. If you send me your email address I can forward you our memory-mapped documentation.

My email address is @.***

--- Bruce

On Fri, Jul 8, 2022 at 11:45 AM Markus Krause @.***> wrote:

Hi, I have been reading this debug spec 1.0.0 now and while I very much agree with the features outlined in section 1.5 and with the multi-hart debug strategy, I have some questions and suggestions aber the register and command structure:

1.

The register offsets from DM-base increment by one, although they are 32-buit wide. As most systems will use byte-adressing, this does not play well e.g. with an APB- or generally bus-based DM register file. You would expect increments by 4 then. 2.

Is it possible to emulate the command structure e.g. in the openocd driver? We are very much deviating away from using JTAG-chains within the SoC because they are all kinds of trouble with regard to power domain switching etc. We much prefer having a central JTAG tap which then translates onto AHB/APB and from ther you can go wherever you want. Another alternative for bus access are SPI interfaces, UARTs and the like. In our debug reg file the core registers are all visible directly by seperate addresses in the APB window without having to first write a command etc. . This is easy, cheap (no extra flops), and writing is just done by writing onto the respective register, which then gets forwarded inside the core directly. So while all this command stuff is nice for being generic, in our case it makes 0 sense. If we wish to be compliant: Can these command be just emulated in the e.g. OpenOCD driver? Same goes for e.g. the resets. The SoC already has provisions for that which make no sense to be replicated in the DM or hart again.

So what I am proposing is e.g. in our case: Spec compliant Debugger sends a read command for register 15. In teh driver this just gets translated into a JTAG2AHB access, reading regsiter 15, and then returning the value together with the ready status bit.

In my view, it would make a lot of sense to allow that.

I'm not sure if I missed it, but is there a general provision to advertise the base address in a memory-mapped DM? I know there is register 0x11 to advertise some of the features, but if you e.g. have multiple DMs in different subsystems, how do you advertise their base addresses?

— Reply to this email directly, view it on GitHub https://github.com/riscv/riscv-debug-spec/issues/742, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJOFJABGCYCG5EXR3Y5DH6LVTBZLHANCNFSM53B2KFSA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

timsifive commented 2 years ago

@markusdd, it sounds like you are happy to implement the Debug Module, but don't want to use JTAG to access it. That is valid and allowed by the spec. The DTM chapter says:

An implementation can be compliant with this specification without implementing any of this section. In that case it must be advertised as conforming to “RISC-V Debug Specification 1.0.0- STABLE, with custom DTM.” If the JTAG DTM described here is implemented, it must be advertised as conforming to the “RISC-V Debug Specification 1.0.0-STABLE, with JTAG DTM.”

In that case you're making up your own way to access the DM, and whatever you make up is fine. Of course existing software will have to be modified to work with it. You could come up with a way to do so over Bluetooth or any transport you care to use. Ideally then you work with this group to get a DTM for your transport defined and added to the spec, so that other people who want to use the same transport will do it in the same way. This can include a way to advertise the base address to the debugger, authentication, etc.

As a last point: I do not get your reference to 'timing discussions in mcontrol6'? What discussion are you referring to? Is there a link you can provide?

See the "Match Control Type 6" section in the spec, which talks about the timing bit.

markusdd commented 2 years ago

Ah so I see, we were talking over each others heads before haha

Yes, that is exactly what we wish to do. Or I should say: have to do because of system requirements. And again: The Bluetooth was a joke^^

What we have is a generic system-wide JTAG TAP that has a JTAG2AHB bridge. This exists for all kinds of purposes. Testing, debug, memory preloading from outside, datapath control, whatever. So our debug module really is just another APB slave in a vast SoC system. The debug module control interface we want to transform now from our proprietary solution into something compliant. The limitation of the support software for the DTM is a non-issue in practice simply because as soon as this goes into cutomers hands you anyway also need to give them software for flashing etc. as well. We don't deal with standard MCUs but rather some domain specific stuff, so no one expects to buy this chip and just connect a Segger-Probe and find it in the drop-down list there.

But just to revisit the earlier point: We can comply with the addresses from the base by shifting them 2 to the left (turn the word into a byte address), still I am wondering how the concept for advertising the base address in the bus system of the DM would look like?

You mentioned this is a non-solved problem right now. Are proposals still being taken? Not saying I have a perfect solution for that yet, but I feel maybe we should look into defining some kind of debug config file in that standard which can be read by any external debug software/hardware to get the necessary info.

Points I am seeing:

baseaddr + size of DMEM and IMEM and their bus-accessibilty
number and type of DMs in the system
baseaddr of each DM (having one DM per hart can happen e.g. if you have completely different RISC-V implementations, 1 much more capable core and one very tiny one with much less features, where using one DM implementation would make little sense)
advertisement of special capabilities (e.g. when you have custom accelerators/extensions/CSRs)

The point I am trying to make: I think from a certain point onwards pure autodiscovery will not cut it, the debugger needs some infeed what it is talking to to deliver the full experience.

timsifive commented 2 years ago

So you have a DM buried somewhere in your SoC. The spec currently defines only one method for an external debugger to talk to a DM: JTAG. (It explicitly allows you to use something else, as long as you're explicit about it.) We're open to adding other methods to talk to the DM, such as using a standard JTAG2AHB bridge and APB (or some other bus?) behind that. Discovery would be part of that method.

markusdd commented 2 years ago

yes, 'buried in our SoC' is the right word I guess^^

Let's say the RISC-V lives in a subsystem that occupies the address space under 0xF000_0000 and the AHB2APB decoder in that subsystem lives at 0xF100_0000. Then the DM would be one of the APB-peripherals under that decoder, let's say at 0xF100_3000 (commonly nowadays you reserve 4kB - so 12bit - for each APB peripheral).

So from the current spec, register 0x00 of the DM would have address 0xF100_3000, the feature register placed at word-addr 0x11 from the spec, would live at 0xF100_3000 + (0x11 << 2) = 0xF100_3044.

So in the end, the dicovery method would need to know a) how to operate the JTAG2AHB to do reads and writes on the bus (usually a defined protocol of selecting the JTAG2AHB bridge by an initial IR scan code and then a series of DR scans to perform a read or write) b) how to find the 0xF100_3000 baseaddr

As soon as you go to memory mapped the actual bus protocol does not matter to the outside debugger. It still only plays JTAG sequences to deliver addresses and read/write commands, if the internal bus is wishbone, AHB, AXI or whatever does not really matter. Also, for convenience, if the Memories of the RISC-V are also bus accessible a simple advertisement of their size and baseaddrr is enough. You do not need any further provisions in the DM, the debugger can just access and read/manipulate them directly if needed. Same goes for the reset by the way: If there are already reset controls in the system, which basically applies to most SoCs, there is no inherent need to have them again in the DM. Just advertise the location and the debugger can request a reset directly there (for us there is central location in each subsystem for that but that might differ materially per SoC).

Of course, to really be generic, such a bus bridge could also be realized using SPI (we had that in the past) or UART with flow control.

ARM has circumvented some of this by forcing their reset vector to address 0 and having many fixed addresses cluttered all over the 32-bit address space. But I consider this to be an actual weakness, because now with multiple cores you have an individual view of the SoC from each core. And if you need to go to the memory of anotehr core, you need to have an alias window, cluttering the address space even more. There is no need to do that with RISC-V. Our core can be configured to have its reset vector wherever it wants. But to retain that flexibility also the DM need to be able to move within the memory map.

I hope this all makes sense.

timsifive commented 2 years ago

That all sounds reasonable. I'd add that the specified JTAG DTM is also simply a bus bridge.

markusdd commented 2 years ago

Yes indeed it is. It just translates from JTAG to AHB via a JTAG data register via a specified command structure. (ARM DAP also has this functionality, but very basic, ours can do a bit more in terms of speed)

timsifive commented 1 year ago

Closing due to lack of activity. I'd still love to add more debug transports to the spec, but not in 1.0.