Open henzef opened 3 years ago
Is the failure instant (i.e. the first time a disconnect is singalled)? Can you supply a cutdown script/program that can replicate the issue with any arbitrary cdc_acm device? My intent is to try to reproduce with a pi zero configured as a serial gadget.
The failure occurs most of the time, but not in 100% of the cases, so it might require a few tries. To reproduce it, it should suffice to open the serial port and trying to read from it in a loop. I can try to put together a minimal example tomorrow. Unfortunately I don't have a Pi0 ...
A minimal test case for my setup is the following:
yes $'\xc0' | tr -d '\n' | pv -L 1k > /dev/ttyACM0
and dmesg -w
in a second terminal
This just sends an endless stream of 0xc0 to my device and limits its bandwidth to prevent flooding the device. I used 0xc0, because the device uses slip encoding and 0xc0 is not a violation of the protocol.
In the success case, pv exits with pv: write failed: Input/output error
as soon as the cdc_acm device is restarted.
In the error case, pv continues copying data without noticing the disconnect. In my current run it's at 1.9MB, lets see if it ever aborts ;-) When the error occurs, killing pv causes the kernel to notice the disconnect immediately.
Note that this method may require some more retries to trigger the error condition, but no more than 5 tries were enough for me so far.
A few more observations:
While the error condition persisted I restarted the cdc_acm device multiple (>10) times, which was not registered by the pi
Physically unplugging the cdc_acm device can also trigger the error condition
When the error condition persisted for more than 90 minutes, I tried to increase the throughput via pv --remote $THE_PID_OF_PV --rate-limit 200k
and pv exited immediately
Raspberry Pi 3B+ is not affected
Can you reproduce it with the Pi0? If not I can try to get my hands on one
I have limited bandwidth to look at issues such as these right now. I will update the issue when I have had a chance to try reproducing it.
I reproduced the issue with a Pi0. Running the script I attached below on the pi4 works after a few attempts. Note that it behaves slightly different compared to my other serial device, as the kernel does not notice the disconnect condition even when the pv process exits.
#!/usr/bin/env bash
SERIAL_PORT="/dev/ttyACM0"
while sleep 0.1 ; do
if [[ -e "$SERIAL_PORT" ]]; then
yes $'\xc0' | tr -d '\n' | pv -L 1k > "$SERIAL_PORT" &
sleep 5
ssh pizero sudo modprobe -r g_serial
echo -e "\ndisconnected"
wait # hangs here if the error occured, because line 7 does not terminate
else
ssh pizero sudo modprobe g_serial
echo "connected"
fi
done
Is the SSH connection over wifi (as in are you using a Pi0W)?
Yes
If this is not easily fixable, would it be possible to add a kernel cmdline parameter to toggle to the old behaviour? I am kind of stuck with an old kernel version here ... :-/
I've not had time to look at this. There's no way to sensibly disable a quirk once applied, as there's no sysfs node to override the value used at probe time.
Another quirk applied to the VLI controller is causing issues elsewhere (#3981) so for now I think both need to be opt-in for testing.
I've pushed a commit that should allow you to restore the pre-Oct 2020 behaviour to https://github.com/P33M/linux/tree/nobble-vli-quirks
Pushing this out will cause regressions for other users, so for now can you build a kernel from this tree locally, and test it?
To switch on the code path that's believed to be causing your issue, add xhci_hcd.quirks=0x10000000000
to /boot/cmdline.txt (and reboot). Testing this will ensure that only the downstream patch has caused the regression and not a combination of upstream + downstream changes.
To build and install, follow the guide here: https://www.raspberrypi.org/documentation/linux/kernel/building.md
I've installed the kernel and will test in the next few days
XHCI_EP_CTX_BROKEN_DCS and XHCI_AVOID_DQ_ON_LINK are (1<<41) and (1<<42) respectively, so I think the 0x10000000000 quirk is wrong. Or do I miss something?
Ah. Rebasing the patch in 5.10 shifted the bits up as upstream started using them. 0x200_0000_0000. and 0x400_0000_0000 respectively.
Ok, I finally had time to prepare a test setup to reproduce it quickly. I tested the following setups:
And the strange thing is, that all trigger the errorcase. Do you have an suggestion what I should try next? cherry-pick your patch the original 5.4 kernel that triggered the bug the first time? or maybe its firmware rather than kernel related?
That's confusing. If the commit in October 2020 is responsible then at least one of the quirks options you used should fix it. It's unlikely that an upstream change would have broken cdc_acm in exactly the same way as the quirk does.
Are you sure that you're booting with the correct kernel each time?
Yes, I explicitly set CONFIG_LOCALVERSION to verify that. Only the dbt and dbto files are unchanged, but these should not matter. Also
$ dmesg | grep quirk
[ 1.360376] xhci_hcd 0000:01:00.0: hcc params 0x002841eb hci version 0x100 quirks 0x0000000000000890
without /boot/cmdline.txt modification should verify that your patch was indeed applied (otherwise the quirks would be 0x0000060000000890, right?)
Same issue, did anyone find a fix for this?
EDIT: Might not exactly be the same issue (though the same behaviour) as downgrading to https://github.com/Hexxeh/rpi-firmware/commit/e530832e9058ab73b0d45457aea06f0ea62a08a8 doesn't fix the issue (RPi 4).
dmesg does'nt report an unplug event when the USB device doesn't respond but it sometimes shows this error
[ 128.223884] cdc_acm 1-1.1:1.0: failed to set dtr/rts
The rpi 3B+ is not affected.
Please run rpi-update and test. This issue may be related to #3981 - I see stall events when running the test script on a Pi 0W.
Th issue is still there after rpi-update on my side.
I finally managed to spend some time on more detailed testing and it turns out that my previous test (https://github.com/raspberrypi/linux/issues/4061#issuecomment-878237270) was flawed by a broken USB hub ;-( With that out of the way, I cherry-picked your patch from https://github.com/P33M/linux/commit/6867c2d672e82f6791d868591eb718647343e988 onto raspberrypi/linux@7136c8f9c83cf1d4ed5a2262cc74118672835218 and got the following results:
xhci_hcd 0000:01:00.0: hcc params 0x002841eb hci version 0x100 quirks 0x0000000000000890
Where do we go from here @p33m? It would be great to have a solution that doesn't require a patched kernel (especially now that bullseye is released)
One more observation:
So using AVOID_DQ_ON_LINK breaks only if an external hub is used? Please post a full sudo lsusb -v
with the hub connected.
Can you confirm that the Pi 0 testcase also triggers the issue with an external hub, and that the Pi0 connects at high speed?
So using AVOID_DQ_ON_LINK breaks only if an external hub is used?
Yes. I will double-check this tomorrow to be 100% sure
Please post a full
sudo lsusb -v
with the hub connected.
See the lsusb output below. Please ignore the "1a86:7523 QinHeng Electronics CH340 serial converter" device, it's unrelated
can't get debug descriptor: Resource temporarily unavailable
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 3.00
bDeviceClass 9 Hub
bDeviceSubClass 0
bDeviceProtocol 3
bMaxPacketSize0 9
idVendor 0x1d6b Linux Foundation
idProduct 0x0003 3.0 root hub
bcdDevice 5.10
iManufacturer 3 Linux 5.10.63-v7l-usbfix+ xhci-hcd
iProduct 2 xHCI Host Controller
iSerial 1 0000:01:00.0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 0x001f
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 9 Hub
bInterfaceSubClass 0
bInterfaceProtocol 0 Full speed (or root) hub
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0004 1x 4 bytes
bInterval 12
bMaxBurst 0
Hub Descriptor:
bLength 12
bDescriptorType 42
nNbrPorts 4
wHubCharacteristic 0x0009
Per-port power switching
Per-port overcurrent protection
bPwrOn2PwrGood 10 * 2 milli seconds
bHubContrCurrent 0 milli Ampere
bHubDecLat 0.0 micro seconds
wHubDelay 0 nano seconds
DeviceRemovable 0x00
Hub Port Status:
Port 1: 0000.02a0 5Gbps power Rx.Detect
Port 2: 0000.02a0 5Gbps power Rx.Detect
Port 3: 0000.02a0 5Gbps power Rx.Detect
Port 4: 0000.02a0 5Gbps power Rx.Detect
Binary Object Store Descriptor:
bLength 5
bDescriptorType 15
wTotalLength 0x000f
bNumDeviceCaps 1
SuperSpeed USB Device Capability:
bLength 10
bDescriptorType 16
bDevCapabilityType 3
bmAttributes 0x02
Latency Tolerance Messages (LTM) Supported
wSpeedsSupported 0x0008
Device can operate at SuperSpeed (5Gbps)
bFunctionalitySupport 3
Lowest fully-functional device speed is SuperSpeed (5Gbps)
bU1DevExitLat 4 micro seconds
bU2DevExitLat 231 micro seconds
Device Status: 0x0001
Self Powered
Bus 001 Device 005: ID 1234:5678 My Device
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 2 Communications
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x1234 My Device
idProduct 0x5678
bcdDevice 1.00
iManufacturer 1 My Manufacturer
iProduct 2 My Product
iSerial 0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 0x0046
bNumInterfaces 2
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xc0
Self Powered
MaxPower 480mA
Interface Association:
bLength 8
bDescriptorType 11
bFirstInterface 0
bInterfaceCount 2
bFunctionClass 2 Communications
bFunctionSubClass 2 Abstract (modem)
bFunctionProtocol 1 AT-commands (v.25ter)
iFunction 0
Interface Descriptor:
can't get device qualifier: Resource temporarily unavailable
can't get debug descriptor: Resource temporarily unavailable
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 1 AT-commands (v.25ter)
iInterface 0
CDC Union:
bMasterInterface 0
bSlaveInterface 1
CDC Header:
bcdCDC 1.01
CDC ACM:
bmCapabilities 0x0e
connection notifications
sends break
line coding and serial state
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x86 EP 6 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 1
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 10 CDC Data
bInterfaceSubClass 0
bInterfaceProtocol 0
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Device Status: 0x0001
Self Powered
Bus 001 Device 004: ID 1a40:0101 Terminus Technology Inc. Hub
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 9 Hub
bDeviceSubClass 0
bDeviceProtocol 1 Single TT
bMaxPacketSize0 64
idVendor 0x1a40 Terminus Technology Inc.
idProduct 0x0101 Hub
bcdDevice 1.11
iManufacturer 0
iProduct 1 USB 2.0 Hub
iSerial 0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 0x0019
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 100mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 9 Hub
bInterfaceSubClass 0
bInterfaceProtocol 0 Full speed (or root) hub
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0001 1x 1 bytes
bInterval 12
Hub Descriptor:
bLength 9
bDescriptorType 41
nNbrPorts 4
wHubCharacteristic 0x0000
Ganged power switching
Ganged overcurrent protection
TT think time 8 FS bits
bPwrOn2PwrGood 50 * 2 milli seconds
bHubContrCurrent 100 milli Ampere
DeviceRemovable 0x00
PortPwrCtrlMask 0xff
Hub Port Status:
Port 1: 0000.0100 power
Port 2: 0000.0100 power
can't get debug descriptor: Resource temporarily unavailable
can't get debug descriptor: Resource temporarily unavailable
Port 3: 0000.0100 power
Port 4: 0000.0103 power enable connect
Device Qualifier (for other device speed):
bLength 10
bDescriptorType 6
bcdUSB 2.00
bDeviceClass 9 Hub
bDeviceSubClass 0
bDeviceProtocol 0 Full speed (or root) hub
bMaxPacketSize0 64
bNumConfigurations 1
Device Status: 0x0001
Self Powered
Bus 001 Device 003: ID 1a86:7523 QinHeng Electronics CH340 serial converter
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 255 Vendor Specific Class
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 8
idVendor 0x1a86 QinHeng Electronics
idProduct 0x7523 CH340 serial converter
bcdDevice 2.64
iManufacturer 0
iProduct 2 USB Serial
iSerial 0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 0x0027
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0x80
(Bus Powered)
MaxPower 98mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 1
bInterfaceProtocol 2
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0020 1x 32 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0020 1x 32 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0008 1x 8 bytes
bInterval 1
Device Status: 0x0000
(Bus Powered)
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.10
bDeviceClass 9 Hub
bDeviceSubClass 0
bDeviceProtocol 1 Single TT
bMaxPacketSize0 64
idVendor 0x2109 VIA Labs, Inc.
idProduct 0x3431 Hub
bcdDevice 4.21
iManufacturer 0
iProduct 1 USB2.0 Hub
iSerial 0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 0x0019
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 100mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 9 Hub
bInterfaceSubClass 0
bInterfaceProtocol 0 Full speed (or root) hub
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes can't get debug descriptor: Resource temporarily unavailable
can't get device qualifier: Resource temporarily unavailable
can't get debug descriptor: Resource temporarily unavailable
3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0001 1x 1 bytes
bInterval 12
Hub Descriptor:
bLength 9
bDescriptorType 41
nNbrPorts 4
wHubCharacteristic 0x00e0
Ganged power switching
Ganged overcurrent protection
TT think time 32 FS bits
Port indicators
bPwrOn2PwrGood 50 * 2 milli seconds
bHubContrCurrent 100 milli Ampere
DeviceRemovable 0x00
PortPwrCtrlMask 0xff
Hub Port Status:
Port 1: 0000.0103 power enable connect
Port 2: 0000.0503 highspeed power enable connect
Port 3: 0000.0100 power
Port 4: 0000.0100 power
Binary Object Store Descriptor:
bLength 5
bDescriptorType 15
wTotalLength 0x002a
bNumDeviceCaps 3
USB 2.0 Extension Device Capability:
bLength 7
bDescriptorType 16
bDevCapabilityType 2
bmAttributes 0x00000002
HIRD Link Power Management (LPM) Supported
SuperSpeed USB Device Capability:
bLength 10
bDescriptorType 16
bDevCapabilityType 3
bmAttributes 0x00
wSpeedsSupported 0x000e
Device can operate at Full Speed (12Mbps)
Device can operate at High Speed (480Mbps)
Device can operate at SuperSpeed (5Gbps)
bFunctionalitySupport 1
Lowest fully-functional device speed is Full Speed (12Mbps)
bU1DevExitLat 4 micro seconds
bU2DevExitLat 231 micro seconds
Container ID Device Capability:
bLength 20
bDescriptorType 16
bDevCapabilityType 4
bReserved 0
ContainerID {30eef35c-07d5-2549-b001-802d79434c30}
Device Status: 0x0001
Self Powered
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 9 Hub
bDeviceSubClass 0
bDeviceProtocol 1 Single TT
bMaxPacketSize0 64
idVendor 0x1d6b Linux Foundation
idProduct 0x0002 2.0 root hub
bcdDevice 5.10
iManufacturer 3 Linux 5.10.63-v7l-usbfix+ xhci-hcd
iProduct 2 xHCI Host Controller
iSerial 1 0000:01:00.0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 0x0019
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 9 Hub
bInterfaceSubClass 0
bInterfaceProtocol 0 Full speed (or root) hub
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0004 1x 4 bytes
bInterval 12
Hub Descriptor:
bLength 9
bDescriptorType 41
nNbrPorts 1
wHubCharacteristic 0x0009
Per-port power switching
Per-port overcurrent protection
TT think time 8 FS bits
bPwrOn2PwrGood 10 * 2 milli seconds
bHubContrCurrent 0 milli Ampere
DeviceRemovable 0x02
PortPwrCtrlMask 0xff
Hub Port Status:
Port 1: 0000.0503 highspeed power enable connect
Device Status: 0x0001
Self Powered
Can you confirm that the Pi 0 testcase also triggers the issue with an external hub, and that the Pi0 connects at high speed?
Unfortunately that can take a bit longer as I don't have the neccessary hardware here at the moment
So using AVOID_DQ_ON_LINK breaks only if an external hub is used?
Yes. I will double-check this tomorrow to be 100% sure
Double check was successful. When booting the patched kernel with xhci_hcd.quirks=0x60000000000 the issue only occurs if the cdc_acm device is connected via an USB hub (D-Link DUB-H4).
Using a Pi0W the script appears to hang after the first loop, hub or not. If I look at the g_serial module refcount - it goes negative:
g_serial 16384 -1
libcomposite 57344 2 g_serial,usb_f_acm
dtoverlay=dwc2,dr_mode=peripheral
.
Do I need to have anything else running on the Pi0W?I got my hands on the Pi0W again and used the same SD card as in https://github.com/raspberrypi/linux/issues/4061#issuecomment-839950846 (so an old installation). I was able to reproduce the symptoms I described above after several retries (not sure how long it took exactly, but it took less than 10 minutes), so at least that testing wasn't flawed by the broken USB hub 👍 The bad news it that it seems to be an entirely different bug, because it is not fixed by disabling the quirk :-(
My /boot/config.txt contains dtoverlay=dwc2 and /boot/cmdline.txt contains modules-load=dwc2,g_serial but I don't remember if I modifed that manually.
uname -a output on the Pi0W is Linux pizero 5.10.17+ #1403 Mon Feb 22 11:26:13 GMT 2021 armv6l GNU/Linux
One more thing: The Pi0 bug also occurs without the external USB hub
Hi, I recently ran into issues with a serial USB device (cdc_acm driver) on a Raspberry Pi 4B 2GB Revision 1.1. I bisected this on the Hexxeh/rpi-firmware repository and f9a31df39c03911ae42e8b43b89313e25c30661c is the first bad commit (e530832 last good):
As it's USB related, a strong candidate would be #3929 (pinging @P33M).
Steps to reproduce:
I have a piece of software that keeps the serial device /dev/ttyACM0 open for continuous communication. From time to time the other communication partner is resetted/restarted, which is normally detected as unplugging and replugging in dmesg:
When upgrading to commit f9a31df or newer, no unplugging is detected and the communication just appears dead, but is not closed. The unplugging and replugging is logged in dmesg as soon as my piece of software closes /dev/ttyACM0 due to timeouts.