[usbdev] Race condition, SETUP packets

Description

The documentation says the following about the configin . rdy bit:

This bit should be set to indicate the buffer is ready for sending. It will be cleared when the ACK is received indicating the host has accepted the data.

This bit will also be cleared if an enabled SETUP transaction is received on the endpoint. This allows use of the IN channel for transfer of SETUP information. The original buffer must be resubmitted after the SETUP sequence is complete. A link reset also clears the bit. In either of the cases where the hardware cancels the transaction it will also set the pend bit.

A SETUP transaction typically consists of the USB host sending a SETUP packet, and then retrieving one or more IN packets in response. Software on OT will handle the reception of a SETUP packet in the RXFIFO by preparing (part of) a response and then setting the above configin.rdy bit to allow the host to retrieve the data. It appears that the functionality for the hardware to clear the rdy bit upon reception of a SETUP packet is meant for cases where the USB host will issue a new SETUP request before having retrieved every part of the previous response. This is a good feature to have, however, it seems to me that there is a gap: If a second SETUP is received while processing of the first one has not yet completed. In that case, when processing of the first SETUP packet completes and is ready to set the rdy bit, the second SETUP packet has already been received, and hence the second SETUP will not cause clearing of the rdy bit. Instead, the USB host will get confused, as it receives stale data and thinks that they were in response to the most recent request.

In my view, we need a hardware feature similar to how the i2cdev treats non-empty ACQ fifo. That is, if the USB rxfifo contains one or more SETUP packets (meaning that there are already requests which the software has not yet seen or processed), then an attempt by software to set the rdy bit should immediately result in the pend bit being set (and the rdy bit not actually being set). This way, the end result will be the same, whether the second SETUP packet arrives one clock cycle before or after the software attempts to set the rdy bit based on a previous SETUP packet.

I am wondering if we can find a software workaround for the A1 hardware. The only thing I can think of is something like every time you want to set the rdy bit on the control endpoint, then you have to disable reception of SETUP packets, inspect the rxfifo to verify that there are none in queue, then set the rdy bit, and then re-enable processing of SETUP packets.

The clearing of 'rdy' and setting 'pend' when a SETUP packet is received is a recovery mechanism, in the event that a preceding Control Transfer failed. Specifically we did not see the ACK from the host for the Status stage of an OUT control transfer.

When you say that '...processing of the first one has not yet completed' it's important to note that Control Transfers to a given endpoint cannot be overlapped. The reception of a SETUP packet implicitly cancels any in-progress Control Transfer.

I've tried to sketch out an explanation of Control Transfers below, and shall try to flesh this out into a bit more comprehensive explanation to include in the programmer's guide for USBDEV, because admittedly Control Transfers can be confusing and USBDEV has no real knowledge of them, nor any involvement in the sequencing/control flow. It deals only with the packet-level transactions but does have the necessary mechanisms to implement them robustly.

Control Transfers consist of two or three stages. There will normally be three stages because some data is transferred, but if there is no data to be transferred then the data stage is skipped.

IN Control Transfer

Setup Stage:

  SETUP packet is received by USBDEV; the contents of this 8-byte packet include the direction
  of the transfer. This packet also contains a field 'wLength' which specifies the maximum number
  of bytes that the host wishes to transfer, although the software/USBDEV may supply fewer bytes
  than requested.

Data Stage:

  Host sends IN requests and collects packets from USBDEV until it has retrieved the number of
  bytes specified in the 'wLength' field, or it receives a packet that is less than the maximum packet
  length. The host only knows that it has received all of the IN data at that point.

  If the amount of IN data that must be transferred is an exact multiple of the maximum packet
  length that the endpoint declared to the host, /and/ the amount transferred is less than 'wLength',
  the device software must generate a 'Zero Length Packet' to indicate that it has finished. This
  instructs the host not to attempt to retrieve any more data, and instead proceed to the Status Stage.

  When the software receives an indication (via `in_sent`) that the final packet was successfully
  collected from the host, it will move to the Status stage and expect to receive an OUT packet.

Status Stage:

  In order to conclude the Control Transfer successfully, the host sends an OUT packet of zero
  bytes to the USBDEV to indicate success. When USBDEV ACKnowledges that ZLP and the
  host sees the ACK successfuly, the Control Transfer is complete.

OUT Control Transfer

Setup Stage:

  SETUP packet is received by USBDEV; the contents of this 8-byte packet include the direction
  of the transfer and, via 'wLength', the anticipated number of bytes to be transmitted.

Data Stage:

  Host sends OUT packets to USBDEV until all of the data has been transferred. When the software
  driving USBDEV has received the number of bytes specified in the 'wLength' of the SETUP packet,
  it shall proceed to the Status stage.

Status Stage:

  In order to conclude the Control Transfer successfully, the software driving USBDEV shall present
  a Zero Length Packet for collection by an IN request from the host. When the host has collected
  that packet and ACKnowledged its successful receipt, the software driving USBDEV knows that the
  Control Transfer has completed successfully.

Two Stage Control Transfers

If the host controller knows that there is no requirement for a Data Stage because there is no data to be transferred, it will move directly into the Status Stage. The software driving USBDEV must do likewise, using the 'wLength' field of the SETUP packet field, and should expect only to send/receive the Zero Length Packet of the Status stage.

See sections 5.5 and 8.5.3 of the USB 2.0 Protocol Specification.

I hope that's a sufficiently clear overview to be helpful. Returning to the original description, the software driving USBDEV should not in normal operation be attempting to present an IN packet for collection at the point that the SETUP packet of a Control Transfer is being received, since both the device software and the hosts controller shall be following the above sequence.

Updated: Thanks to @pamaury for correcting my recollection of some of the subtle details. There is a tested implementation of Control Transfer support via USBDEV in sw/device/lib/testing/usb_testutils_controlep.c and ..usb_testutils.c.

I realize that what I describe should only happen in odd situations, under which maybe the host controller "looses state" and forgets that it is in the process of retrieving a sequence of data packets as part of an "IN Control Transfer", and instead sends a new SETUP packet. I assume that the recovery mechanism of clearing the rdy flag upon receiving SETUP is meant to gracefully recover even from such cases, and I wanted to point out that there is a gap, in that the SETUP packet could very easily have arrived before the software set the rdy bit.

In my case, I have another bug in the software running on OT, causing it to deviate from the protocol outlined above, and which makes the situation much more likely to happen, but I wanted to raise this issue for consideration nonetheless.

Actually thinking about the further statement "A link reset also clears the bit." I think there is a race there, as well. In case the link is reset while the software is processing for instance a SETUP packet (or processing any other normal part of its logic), which may end in the preparing outbound data and the configin.rdy bit being set, then the automatic logic to clear rdy on link reset could also be sidestepped, if the link reset happened before the rdy bit was set. If we wanted to make the logic fool-proof, then link reset should latch a "link reset"-bit, which software needs to clear before any rdy bits could be set.

May I ask whether you are using interrupts or simply polling to drive the USB device? When a packet such as a SETUP packet is accepted into RX FIFO, the 'pkt_received' interrupt is asserted. It is then important to collect all packets from the RX FIFO in the interrupt handler, because otherwise the interrupt will remain asserted. Also, not doing so opens up the timing window for such races to occur.

You are correct that there will be a tiny window of a few clock cycles after the SETUP packet results in 'rdy' being cleared and 'pend' being asserted, and before the interrupt arrives at the CPU, interrupting the execution of any foreground code that may be trying to present an IN packet. The interrupt handler can safely retract an IN packet if one has been marked 'rdy' in that time, and it cannot be collected for something like 4.3 microseconds, on account of the time that it takes the ACK to the SETUP packet and the IN token packet to be transferred over the USB.

It actually doesn't need the change in #21771 (prod only) to retract the IN packet either, because it has always been safe simply to clear the 'rdy' bit in software, provided that there is no chance that the packet is actually being transmitted to the host.

I think this comes down to careful consideration of the interactions of foreground code and the usbdev interrupt handler. If hardware interrupts are not being used then the pkt_received bit of the intr_state register will similarly need to be consulted before around setting 'rdy.'

We are using interrupts, but TockOS strongly encourages all actual processing be done in what I would call "bottom half handlers", that is, there could be a long and unpredictable delay between the interrupt being raised, and the code running that actually handles the interrupt. (We "claim" the interrupt with the PLIC from the actual "top" handler, so that execution can leave the interrupt routine without immediately re-entering, but then other kernel code could run, potentially bottom half handlers belonging to prior events, which could e.g. mark an IN packet rdy, before our bottom half handler actually services the usbdev, and finally marks the interrupt as "completed" with the PLIC.)

Given the above, we cannot easily guarantee that we will react to interrupts within 4.3 microseconds, or any other bound. This is why we have taken care to ensure that e.g. the I2C device never requires interrupt latency guarantees for correct operation.

When fixing my logic to not incorrectly mark empty IN packets as "ready", the problem becomes much less likely to manifest. Basically, only of the USB host does something really weird of sending two SETUP in quick succession, such that the second one arrives as the bottom half handler belonging to the first is executing, then the result could be that our code puts the response to the first in queue, while the host thinks it is the response to the second. I think it is safe to assume that if a USB host "forgets" what it is doing like that, there will quite likely be some significant delay, as it would be a result of drivers or maybe user processes crashing and re-starting. So it is quite possible that it will work fine for all time, without further enhancement of the hardware recovery logic.

Out of curiosity, can you share the kind of host that generates two SETUP packets that quickly (what is the time difference between the two)? I mean technically it's not violating the spec because the second SETUP packet is supposed to cancel the previous one but this should not happen under normal circumstances.

Maybe you could do a minimal amount of processing in the ISR just to specifically abort the previous control transfer if it's not finished?

We are using interrupts, but TockOS strongly encourages all actual processing be done in what I would call "bottom half handlers", that is, there could be a long and unpredictable delay between the interrupt being raised, and the code running that actually handles the interrupt.

This statement frequently comes up with "strongly encourages," and it needs to not be escalated beyond that. If TockOS does not permit any top half logic to run, then it is a completely broken design. The general principles of good ISR design have us minimize the top half to minimize the response time to a pending high-priority event (interrupts are disabled in the top half, and the size of the critical section prevents responding to a new event).

If we are only allowed to handle the PLIC in the ISR, then there is no top half (only the scheduling overhead), and the fast response class is eliminated, likely leaving us with a terrible-by-all-metrics event handling architecture.

Maybe you could do a minimal amount of processing in the ISR just to specifically abort the previous control transfer if it's not finished?

I have non-interrupt kernel code which looks at some state (generating a sequence of response packets to a previous SETUP packet), makes a decision that another IN response is warranted, produces the data, and then sets the rdy bit. The issue is that if a second SETUP packet arrives after the decision has been made, but before the rdy bit is set, then the hardware will think that the IN packet should be a response to the newly received SETUP. In order to do what you suggest here, if the kernel logic was interrupted the instruction before setting the rdy bit, I would need to do some exceedingly ugly manipulation of the process stack in order for the ISR to prevent the ordinary kernel code from marking the packet as ready. (And even this would not catch the case that the packet arrived just before the rdy was set, but the processor core only saw the interrupt a few cycles later.)

To be 100% failsafe, I see no other way than making the USB IP aware that if a SETUP packet sits in the queue, which the software has not yet acknowledged (by pulling it out of the FIFO), the hardware should treat software attempts to set rdy as being immediately cancelled.

If we are only allowed to handle the PLIC in the ISR, then there is no top half (only the scheduling overhead), and the fast response class is eliminated, likely leaving us with a terrible-by-all-metrics event handling architecture.

In my view, the Ti50 team at Google is trying to reap the maximum benefit of the Rust language security features. Any time we have top half ISR handlers accessing and mutating state also used by the rest of the kernel, then we have to side-step the Rust memory safety features. (As by the nature of ISR handlers, the Rust reference validator cannot guarantee that we do not have simultaneous mutable references to the shared state from interrupt handler and main kernel code.) We want to minimize the number of instances that we side-step compiler safeguards, which is why we want to take any opportunity to avoid hard timing requirements, and not have any top half handling save for the trivial scheduling. (In practice, we do have a few non-trivial top-half handlers, but I feel it would be a shame to have to add one for this issue, when in my view it makes sense to eliminate the race in hardware, in a similar fashion as was done with the I2C device IP.)

Out of curiosity, can you share the kind of host that generates two SETUP packets that quickly (what is the time difference between the two)? I mean technically it's not violating the spec because the second SETUP packet is supposed to cancel the previous one but this should not happen under normal circumstances.

I have not seen the host generate back-to-back SETUP packets in practice. I encountered this issue because of another bug in my code, which triggered the production of extraneous zero-length IN packets in my code right after any IN response to a SETUP packet, even when none should have been sent because the first IN packet was less than 64 bytes. That bug allowed me to discover that though such prepared IN packets, which were not retrieved by the host would be discarded by the hardware if a SETUP arrived later, it was a problem if the SETUP arrived before rdy was set, but after the code had made a decision to respond to what it believed was the "current" SETUP request.

lowRISC / opentitan

[usbdev] Race condition, SETUP packets #24023

Description