pyocd / pyOCD

Open source Python library for programming and debugging Arm Cortex-M microcontrollers
https://pyocd.io
Apache License 2.0
1.13k stars 484 forks source link

pyOCD race condition bugs in write_core_registers_raw and read_core_registers_raw #819

Open buzmeg opened 4 years ago

buzmeg commented 4 years ago

Quoting: https://github.com/mbedmicro/pyOCD/blob/master/pyocd/coresight/cortex_m.py

        # Technically, we need to poll S_REGRDY in DHCSR here before reading DCRDR. But
        # we're running so slow compared to the target that it's not necessary.
        # Read it and assert that S_REGRDY is set

Technically, this is a bug. And it bit me. :(

This is no longer "technically" true. 5 years of computer improvements makes this statement now false. I had to check the S_REGRDY or my probe smashed the register by executing another write too quickly and then failed the assert on the callbacks.

These comments are in the functions write_core_registers_raw and read_core_registers_raw.

I added a read to validate the S_REGRDY, but this seems like a duct tape patch. And it sort of conflicts with the whole read callback thing, so I'm kind of reluctant to submit a patch given that I really don't understand how to do this immediately and then with the callbacks in a way that isn't redundant.

Edit: Looks like I'm not the only one to see this: https://github.com/mbedmicro/pyOCD/issues/693

flit commented 4 years ago

Thanks a lot for the report.

I've recently been working in this area (flit/pyOCD@017bf48 and flit/pyOCD@dd9b69a), and was wondering if this could be improved. The problem is that simply polling S_REGRDY absolutely kills gdb performance. The use of interrupt USB pipes for CMSIS-DAPv1 mandates queued reads and writes to get acceptable register access performance. CMSIS-DAPv2 improves this with bulk pipes, but queueing is still necessary.

One option might be to keep the same structure for an initial attempt, but if the check of S_REGRDY fails then restart the operation with a slower method using polling.

I think #693 is a different case, where the CPU is not halted as expected and so the assert is triggered. (That assert is problematic.)

buzmeg commented 4 years ago

The problem is that simply polling S_REGRDY absolutely kills gdb performance.

Why? If S_REGRDY isn't set everything falls to pieces anyway, no?

I presume the issue with polling is that each operation requires a USB round trip.

Why not use the Value Mask/Match Mask mechanisms to let the DAP_Transfer command do the polling on the other side and then drop to slower mechanisms if that comes back having failed?

flit commented 4 years ago

I presume the issue with polling is that each operation requires a USB round trip.

Right, the issue with polling is the multiple USB round trips.

Why not use the Value Mask/Match Mask mechanisms to let the DAP_Transfer command do the polling on the other side and then drop to slower mechanisms if that comes back having failed?

Just what I was thinking. Unfortunately match mask is not currently supported by the internal probe API. And some probes like STLink don't support it at all (afaik), so it would have to be emulated. A probe API for reading (and one for writing) core registers could be added, but I've been trying to avoid lots of probe-specific code.

buzmeg commented 4 years ago

A probe API for reading (and one for writing) core registers could be added, but I've been trying to avoid lots of probe-specific code.

It looks like stlink has convenience functions to read all registers. See: stlink_usb_read_regs in OpenOCD: http://openocd.org/doc/doxygen/html/stlink__usb_8c.html#a36aeee192b7427e8404bfb529b00edad

Probably the easiest solution would be to add a "read_regs"/"write_regs" function that uses the ST-Link USB call or emulates it with DAP_Transfer and match on CMSIS-DAP probes.

At this point I don't think you can duck the probe bifurcation if you want to maintain performance.

To be fair, I'm far more concerned about correctness than performance. I would be really worried that races like these are contributing to instability under the situations that you are most concerned about performance because you are hitting them so often.

buzmeg commented 4 years ago

The problem is that simply polling S_REGRDY absolutely kills gdb performance.

Is this actually a true statement?

I concede that it is probably true for HID on USB 2.0 Full Speed. Turnaround takes a minimum of 1ms.

However, for HID on USB 2.0 High Speed that may not be true (and is probably why I tripped over this as I'm using a Beaglebone Black and the USB is 2.0 High Speed).

USB 2.0 High Speed can put 3 interrupt transfers per 125us microframe. That's about 40us turnaround. If the SWD interface is at 1MHz, that's about the same order of magnitude as an SWD transaction.

If you have USB gurus you can throw this at, it might be a worthwhile discussion.

buzmeg commented 4 years ago

I can confirm that this bug also trips for bulk endpoints. It looks like the commonality is USB 2.0 High Speed. I suspect that any CMSIS-DAP probe running at USB 2.0 HS is going to trip this.

flit commented 4 years ago

This is bubbling up in priority. Interestingly, nobody else has reported encountering this issue, even with probes that have HS USB. I personally use HS USB probes almost all the time and haven't seen it.

You're right that polling probaly won't kill performance on a HS interface. Problem is that it has to function well for FS probes, too, and those are still in the majority.

However, the key word in my statement about polling S_REGRDY was "simply". Probably should've used "naïvely" instead. CMSIS-DAP supports a variant of the transfer command that can wait for a value/mask match. Of course, that only addresses CMSIS-DAP.

I'd really prefer not to expose any core-level APIs such as core register access at the probe layer, as that breaks layering too much for my taste. But you may be right that it's unavoidable. Anyway, it would pretty low priority. But if someone else were to implement it, I would be happy to consider merging.

buzmeg commented 4 years ago

Interesting. Which probes are you using that are CMSIS-DAP with USB 2.0 High Speed? I should probably pick a couple up for validation.

Side question: where is the code repository for validating a CMSIS-DAP probe? I can't seem to find it.

Nevermind: Found it down in the CMSIS_5 repo: CMSIS_5/CMSIS/DAP/Firmware/Validation/MDK5

buzmeg commented 4 years ago

I see that the LPC Link II is a high-speed probe, but it looks like the LPCScrypt CMSIS-DAP firmware for it is only v1 HID and not v2 Bulk.

buzmeg commented 4 years ago

Unless there has been a recent change in pyocd, I can confirm that this seems to be a race condition.

When I rewrote my usb_gadget handler into Rust, I can run 0.26.1.dev100 without incident.

It seems like if you handle this fast enough, things work, and if you handle them slow enough, things work.

My suspicion would be that returning a fast USB stall causes problems, but that's pure guesswork.