CDC: Hung once rx fifo queue is sufficient full

mjan-jansson commented 2 years ago

Operating System

RaspberryPi OS

Board

RaspberyPi Pico

Firmware

Pico is running Pico SDK 1.4.0 The RPI4 is running Linux dev1 5.10.103-v7l+

What happened ?

The board runs as a USB device. It has a loop which write 20 times more data than it reads, over the CDC serial port. When using some USB hosts (but not all), the board will eventually hang and not be able to send any data.

I believe that this is what is happening when running the loop mentioned above:

The host is sending an OUT packet to the appropriate CDC endpoint on the board
The hardware layer get interrupted once the buffer is filled, and subsequently causes cdcd_xfer_cb to be called.
Once the received data is pushed on to the rx fifo queue, _prep_out_transaction is called (to accept more OUT packets)
The fifo queue is too full, resulting in that no further OUT packets are requested.
At a later point in time, tud_cdc_n_write is called, arming the IN endpoint.
The host marks the device as broken (*see comment below) and will no longer poll for IN packets.
Subsequent calls to tud_cdc_n_write will fails with a zero byte return, because the endpoint is busy awaiting the last IN packet to be sent.

At this point, the board will hang, since it is unable to write all the data.

Workaround

If tud_cdc_n_read is called (possibly several minutes after the host has marked the CDC port as defunct), the port will continue to work again without any loss of read or written data.

Comments

Note that this depends on the behavior of the USB host. I have tested this on a SBC (NanoPi NEO Plus2), Windows 11 machine and a RaspberryPi 4, and I am not seeing the hang with neither Windows nor the SBC as the host. I am seeing the hang 100% on RPI4, and I can verify the status of the two hardware buffers in question. The device is ignoring the OUT packets and the host is not polling for the IN packets. The SBC is running Linux 4.14 using ohci. My RPI4 is running Linux 5.10 using xhci_hcd (in case it matters).

I am seeing other issues as well with RPI4, with a more complex software, where CDC driver in TinyUSB is unable to create further IN packets, but there are no pending OUT packets from the host.

One could argue that neither RPi4 nor the Pico is at fault here. I am reasoning that the RP2040 MCU should silently ignore the OUT packets until it is ready to accept more data. By not responding to the OUT packets, the host should treat this as a failed transmission (data error), so host should retry but perhaps not forever? After some retries, the devices would be declared broken. On the other hand, not being able to accept data for long periods of time could be an acceptable scenario for certain devices, so maybe the USB stack on the host is at fault?

Regardless of the correct behavior, I would still like to fix this, but I am not sure what the correct behavior should be.

I can find other reports which basically boils down to this problem. I am not seeing any fixes in tinyusb for this though, and I don't want to revive old issues. For example, https://github.com/hathach/tinyusb/issues/1273 . Looks like it was fixed in tinypython https://github.com/micropython/micropython/pull/8520.

How to reproduce ?

Here is an example code that illustrates this problem. I'm using a 256 bytes size for all three (RX, TX, EP) buffers.

uint8_t usbbuf[512];
int main() {
    tusb_init();
    while (true) {
        tud_task();
        if (tud_cdc_n_available(0)) {
            auto cbRead = tud_cdc_n_read(0, usbbuf, sizeof(usbbuf));
            for (int i=0; i<20; i++) {
                auto cb = cbRead;
                uint8_t *pb = usbbuf;
                while (cb) {
                    auto out = tud_cdc_n_write(0, pb, cb);
                    cb -= out;
                    pb += out;
                    // Workaround: Call tud_cdc_n_read here.
                    tud_task();
                }
            }
            tud_cdc_n_write_flush(0);
        }
    }
}

To trigger the test, run this on RPI4 :

dd if=/dev/random of=/dev/ttyACM0 status=progess

If you break into the debugger once you get the hung, you can tell that the hardware buffer register corresponding to the IN and OUT endpoints are in a state where IN is active (hung) and OUT is inactive (not accepting any packets).

Debug Log as txt file

I can produce usbmon logs if it would help (I think not).

Screenshots

No response

pigrew commented 2 years ago

One could argue that neither RPi4 nor the Pico is at fault here. I am reasoning that the RP2040 MCU should silently ignore the OUT packets until it is ready to accept more data. By not responding to the OUT packets, the host should treat this as a failed transmission (data error), so host should retry but perhaps not forever? After some retries, the devices would be declared broken. On the other hand, not being able to accept data for long periods of time could be an acceptable scenario for certain devices, so maybe the USB stack on the host is at fault?

The USB device (if its buffers are full, or the device application is not ready for data), should NAK the BULK OUT packets, an only ACK them once it is ready for the additional data. The host's hardware should keep retrying the BULK OUT packets every millisecond. The host may eventually get upset and reset the device, or perform other actions to try to reset the pipe like CLEAR_FEATURE(ENDPOINT_HALT). (USB high-speed is slightly more complicated, with the NYET response.) If there is a device error, the device will either not give a packet response, or STALL the endpoint.

There are some issues with TinyUSB where it's improperly handling clearing the endpoint halt (#1537), but I don't think that's what's going on here (unless you have a packet log to prove it?) On the other hand, the host-dependent behavior suggests something special is going on with Linux... as if when it times out, it does try to reset the pipe or something else.

mjan-jansson commented 2 years ago

From my (quite recently acquired) understanding of USB, I'd like to suggest a solution.

I think the problem is that the OUT endpoint is not being "armed" in step 4 above. Even though the rx fifo queue is full, the hardware buffer should be setup to receive more OUT packets. Once the host tries to send another OUT packet, it should be replied with a NAK until there is room in the fifo queue to accept more data. Unless the hardware think the buffer is already full, it would reply with an ACK. Therefor, the OUT packet should be requested without clearing the FULL flag of the buffer register (I assume that something similar could be done with other hardware than the RP2040). The hardware layer would need to be aware if the various interface layers are ready to accept more data, in order to know if the OUT data should be accepted to postponed.

mjan-jansson commented 2 years ago

@pigrew thank for the feedback. I don't think there are any NAK at all in this case, which is the cause of the problem. I don't have a hardware usb analyzer so I cannot very that though.

I'll have a look as well if there are any reset etc.

pigrew commented 2 years ago

I have a RP2040 somewhere, but I'm not familiar with its USB peripheral. I've also not used the FIFO transfers in TinyUSB.... (I only have experience with the block transfer modes).

It would be trickier programming, but the device code could let the HW buffer fill up while the SW FIFO is full. Once the HW buffer is full, the peripheral should automatically NAK future transfers. If it never NAKs when one or both buffers are full, than that certainly is a problem.

You may be able to enable TinyUSB logging (see the CFG_TUSB_DEBUG constant), in lieu of using a USB analyzer, to if any special control transfers are happening during the deadlock.

There may also be 'zero length packet' bugs happening, if one side is expecting a ZLP to terminate a transfer but it is never received... At this point, the bug could be anything. I'm also going to need to study your code... figure out what the cdc_flush does, and the other functions...

mjan-jansson commented 2 years ago

fyi - The RP2040 has a hardware PHY and dedicated DPSRAM, available to both the device/host controller hardware and the CPU cores. More information on this can be found in section 4.1 in the RP2040 datasheet https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf.

From testing, I can tell that once an OUT packet is received and an interrupt is triggered, the hardware buffer for the endpoint is marked as being full and exclusive to the CPU core (controller can no longer access it).

Since the USB hardware is prohibited from accessing the buffer control register at this point, I am guessing that further OUT packets would be fully ignored without any response, unless the control register is handed back to the device controller again with new configuration. In this case, it never is (see my repro steps above).

@kilograham, I am hoping you might help me here. Could you assist in getting the RP2040 datasheet updated with info regarding the device processing of OUT packets (section 4.1.2.6.3)?

When are NAK sent?
When are packets ignored?
How are packets processed while the AVAIL bit is cleared?

pigrew commented 2 years ago

@mjan-jansson, inspired by your last comment, I created https://github.com/raspberrypi/pico-feedback/issues/265 .

Also, it made me wonder what the behavior is for data sequence errors, as that's also not well specified.

mjan-jansson commented 2 years ago

Interesting. If I read the source code correctly, double buffering is disabled in device mode in TinyUSB for RP2040 (see roughly line 180 in rp2040_usb.c). I'm not sure I understand the rational for that. I am assuming it does not affect my use case other than performance. It would nevertheless be most interesting to learn more about this.

mjan-jansson commented 2 years ago

In case anyone is still looking into this; I've updated the comment section above about tests with another board as the host (a H5 based board from FriendlyElec). No problems there either. Given other problems with the USB stack on Raspberry Pi (e.g. https://forums.raspberrypi.com/viewtopic.php?t=245931) I'm writing this off as a hardware specific RPI issue, and I do not have the tools to investigate any further.

pigrew commented 2 years ago

Thanks for the update, yes... I think a USB-analyzer is needed for for efficient debug. I did run the code on RP2040 with a Windows 10 host, and with a small amount of testing, could not reproduce.

lurch commented 2 years ago

ping @P33M , in case he has any interest in investigating this further.

liamfraser commented 2 years ago

mjan-jansson

When are NAK sent? When are packets ignored? How are packets processed while the AVAIL bit is cleared?

Each time the device controller receives an IN or an OUT packet from the host, it will read the buffer control DPSRAM. If the available bit (and the full bit 1 for an IN TO HOST transfer, or the full bit 0 for an OUT FROM HOST transfer) is set then the data is transferred. If not a NAK is sent. This is the same for both directions.

mjan-jansson commented 2 years ago

Many thanks for the info! So unless there is a real data error (checksum error), packets are never ignored, right? I would have no further ideas why the USB stack on RPI4 produces this problem. At some point, I may get an USB analyzer to debug this further.

hathach / tinyusb