raspberrypi / pico-feedback

25 stars 2 forks source link

USB device erratum for aborted control transfers? #385

Open tlyu opened 6 months ago

tlyu commented 6 months ago

It seems that macOS sometimes rapidly issues multiple CDC control requests back-to-back, aborting some of them, when a process closes a CDC serial device. This causes a hard fault in TinyUSB due to buffer control state being retained when a new SETUP packet arrives before the previous control transfer completed.

The datasheet says that the STALL bits get reset by an incoming SETUP packet, but not what happens to the EP0 buffer controls. The USB peripherals on some other MCUs (e.g., STM32) do reset the endpoint state for EP0 when a SETUP packet arrives, regardless of whether the previous control transfer completed.

I don't have detailed traces of what's happening at the hardware level, because I don't currently have a RP2040 board with accessible SWD traces.

I've submitted a workaround to TinyUSB, but this seems like a hardware erratum, because there's no guarantee that the buffer control won't cause an unwanted transaction to occur before the ISR can service the interrupt for the new SETUP and adjust the buffer controls.

See also:

adafruit/circuitpython#8824 (analysis and traces) hathach/tinyusb#2492 (workaround patch)

aallan commented 6 months ago

Sounds like one for @liamfraser ?

liamfraser commented 6 months ago

The SETUP packet does not clear EP0's stall bits directly. This would have been tricky to implement because the stall bits come from a memory rather than a register. Instead, the bits in the EP_STALL_ARM Register are cleared. To actually send a stall, you need both the stall bit in the buffer control memory to be set, along with the ARM bits. So if only the memory bit is set but the register is cleared then you can tell the stall is no longer valid.

tlyu commented 6 months ago

@liamfraser thank you for the details. They are very helpful.

From your reply, I can conclude that with current hardware, race conditions regarding EP0 buffer control state are unavoidable (except for inhibiting/clearing STALL). Also, fixing them might be tricky, because endpoint ACK/NAK/STALL configuration is in memory instead of registers. Do you agree?

My current workaround uses the ABORT register, which probably doesn't work on B1 hardware or earlier (due to RP2040-E2). I'm guessing that the hardware fix might be to force the ABORT bits for EP0 upon receiving a SETUP packet? I'm assuming that the same hardware limitations that led to using additional register bits to mask the effect of the STALL bits in the EP0 buffer controls would apply for the AVAILABLE bit.

tlyu commented 6 months ago

Some tracing added by @eightycc seems to confirm that the USB peripheral doesn't reset anything in the EP0 buffer controls when receiving a SETUP packet.

danielstuart14 commented 4 months ago

I think this is something that I'm seeing on my board, even with the latest tinyusb (after patch). (I also tested with Embassy-USB, same thing). We have a board that has an USB2422 HUB that connects both an RP2040 (B2 Revision) and an external USB port.

Sometimes, when connecting a new device to the external port, the RP2040 will stop communicating with the O.S. (we are using CDC, device is still recognized, just won't accept any messages - ERR 13 on windows):

image

USB2422 Errata says that there is an issue that can happen with split transactions, we checked the RP2040 USB signals and it seems like one packet sometimes looses some of its data: image

The problem is that the RP2040 isn't able to recover from this issue, and just stops acknowledging the setup packets from the HOST: image

If we try to send any new data through the CDC interface, this is what happens in the bus: image

1 - The host sends the data, but no ACK/NACK is received image

2 - The host tries to receive some data (44us later): image

3 - Device responds an ACK (9us later): image

4 - Nothing is sent for 30us and then the HOST retries...

The only things that we found that would make it work again are:

For the two latter cases, we found out that the USB line reset is what makes it work again. image

@tlyu is this something that your patch should fix? By reading both the description here and in your commit I thought it was similar.