raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.14k stars 4.99k forks source link

[RPi4 4GB] xHCI host controller not responding, assume dead #3404

Closed JANogueira closed 1 year ago

JANogueira commented 4 years ago

Describe the bug After boot-up, an when activating a service that uses USB interface (Network UPS tools, as an example) xHCI interface crashs and the USB devices get disconnected. Recovery only possible after system reboot.

List of USB devices when the system boots up:

Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 003: ID 0463:ffff MGE UPS Systems UPS
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

After enabling a:

Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

To reproduce After system boot, start Network UPS tools (UPS connected through USB, having a poll frequency of 30 seconds), and the xHCI interface will crash shortly after, leading to all USB devices being disconnected.

System

Details: Raspberry Pi 4 Model B Rev 1.1

# cat /etc/os-release | head -4
NAME=HassOS
VERSION="3.8 (RaspberryPi 4 64bit)"
ID=hassos
VERSION_ID=3.8
# uname -a
Linux hassio 4.19.93-v8 #1 SMP PREEMPT Sun Jan 12 18:33:10 UTC 2020 aarch64 Hassio/OS
# cat /proc/cpuinfo | tail -3
Revision        : c03111
Serial          : 1000000013df512e
Model           : Raspberry Pi 4 Model B Rev 1.1
#
# df                                                                                                                                                                                                                                     
Filesystem           1K-blocks      Used Available Use% Mounted on                                                                                                                                                                       
/dev/root                92032     92032         0 100% /                                                                                                                                                                                
devtmpfs               1915496         0   1915496   0% /dev                                                                                                                                                                             
tmpfs                  1948808         0   1948808   0% /dev/shm                                                                                                                                                                         
tmpfs                  1948808       708   1948100   0% /run                                                                                                                                                                             
tmpfs                  1948808         0   1948808   0% /sys/fs/cgroup                                                                                                                                                                   
tmpfs                  1948808       708   1948100   0% /etc/machine-id                                                                                                                                                                  
/dev/mmcblk0p7           91099     18656     65562  22% /mnt/overlay                                                                                                                                                                     
/dev/mmcblk0p7           91099     18656     65562  22% /root/.docker                                                                                                                                                                    
/dev/mmcblk0p7           91099     18656     65562  22% /etc/modprobe.d                                                                                                                                                                  
/dev/mmcblk0p7           91099     18656     65562  22% /etc/modules-load.d                                                                                                                                                              
/dev/mmcblk0p7           91099     18656     65562  22% /etc/docker                                                                                                                                                                      
/dev/mmcblk0p7           91099     18656     65562  22% /etc/dropbear                                                                                                                                                                    
/dev/mmcblk0p7           91099     18656     65562  22% /etc/udev/rules.d
/dev/mmcblk0p7           91099     18656     65562  22% /root/.ssh
/dev/mmcblk0p1           32686      3650     29036  11% /mnt/boot
/dev/mmcblk0p7           91099     18656     65562  22% /etc/hostname
/dev/mmcblk0p7           91099     18656     65562  22% /etc/systemd/timesyncd.conf
/dev/mmcblk0p7           91099     18656     65562  22% /etc/NetworkManager/system-connections
/dev/mmcblk0p7           91099     18656     65562  22% /etc/hosts
/dev/mmcblk0p8       122172044   6834280 109113732   6% /mnt/data
/dev/zram2               15856        40     14676   0% /tmp
/dev/zram1               31728       128     29312   0% /var
/dev/mmcblk0p7           91099     18656     65562  22% /var/lib/bluetooth
/dev/mmcblk0p8       122172044   6834280 109113732   6% /var/lib/docker
/dev/mmcblk0p7           91099     18656     65562  22% /var/log/journal
/dev/mmcblk0p7           91099     18656     65562  22% /var/lib/systemd
/dev/mmcblk0p7           91099     18656     65562  22% /var/lib/NetworkManager
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/c79789ef74f2520ce9bf8308f4f0a0a0c2e2d1453a2836027ab47f3c629d6263/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/c79789ef74f2520ce9bf8308f4f0a0a0c2e2d1453a2836027ab47f3c629d6263/merged
shm                      65536         0     65536   0% /mnt/data/docker/containers/4383e4a9b4bfd33f30377b389a5dd8e96f92522bfb5f5e75b5268393aca9af5f/mounts/shm
shm                      65536         0     65536   0% /var/lib/docker/containers/4383e4a9b4bfd33f30377b389a5dd8e96f92522bfb5f5e75b5268393aca9af5f/mounts/shm
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/22cdad6842ade8567501c09ce194d03f072af3910c18cb54da5175d19b0e3aa6/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/22cdad6842ade8567501c09ce194d03f072af3910c18cb54da5175d19b0e3aa6/merged
shm                      65536         0     65536   0% /mnt/data/docker/containers/d383f5286c098e038869aa411fbc087895e9852c34db2276b293d5493076e82f/mounts/shm
shm                      65536         0     65536   0% /var/lib/docker/containers/d383f5286c098e038869aa411fbc087895e9852c34db2276b293d5493076e82f/mounts/shm
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/8a1741030a19b420c34768b69144ab241b11b00ec67083b86228cee3844dd40a/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/8a1741030a19b420c34768b69144ab241b11b00ec67083b86228cee3844dd40a/merged
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/69458a8f8866e9605cd4a15109eb3d1b2a81e7b2b76c81b8f7a1d1f3f7241ec1/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/69458a8f8866e9605cd4a15109eb3d1b2a81e7b2b76c81b8f7a1d1f3f7241ec1/merged
shm                      65536        64     65472   0% /mnt/data/docker/containers/b9ddddacc42db28bed967a218c67d490d3b89e3bf3ac4bf980d0f027af739f41/mounts/shm
shm                      65536        64     65472   0% /var/lib/docker/containers/b9ddddacc42db28bed967a218c67d490d3b89e3bf3ac4bf980d0f027af739f41/mounts/shm
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/563028cbdc5e65467e4299d65e42050ca216a6252d1e144e8491723a1434ad1e/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/563028cbdc5e65467e4299d65e42050ca216a6252d1e144e8491723a1434ad1e/merged
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/a00df7501c5d7ef5be8316129829c2b57b7fef415914170006b3172bb43ebaaa/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/a00df7501c5d7ef5be8316129829c2b57b7fef415914170006b3172bb43ebaaa/merged
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/a72a5bc9321264232980140c9bc6686e89db9c4a83a1da17c8755811f160de6a/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/a72a5bc9321264232980140c9bc6686e89db9c4a83a1da17c8755811f160de6a/merged
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/7624b7f0f58cc8d32d9d8702d26ae71c27fcd0026f27a6623b15c0ec13fd499a/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/7624b7f0f58cc8d32d9d8702d26ae71c27fcd0026f27a6623b15c0ec13fd499a/merged
shm                      65536        48     65488   0% /mnt/data/docker/containers/376a3e6a98ad20b772ae7f38b2051750d0d38d2f876be82d1cc17a81ae801a05/mounts/shm
shm                      65536        48     65488   0% /var/lib/docker/containers/376a3e6a98ad20b772ae7f38b2051750d0d38d2f876be82d1cc17a81ae801a05/mounts/shm
shm                      65536        60     65476   0% /mnt/data/docker/containers/8043d49fda6b8c96b4cfd0fd77a8499b6f70e5a4ad7a66fc37eb417c1ac6a936/mounts/shm
shm                      65536        60     65476   0% /var/lib/docker/containers/8043d49fda6b8c96b4cfd0fd77a8499b6f70e5a4ad7a66fc37eb417c1ac6a936/mounts/shm
shm                      65536        68     65468   0% /mnt/data/docker/containers/ef4df3ae0e79c33285baec87cd55c0b9d28d150ffda2a19e799979073cd1297a/mounts/shm
shm                      65536        68     65468   0% /var/lib/docker/containers/ef4df3ae0e79c33285baec87cd55c0b9d28d150ffda2a19e799979073cd1297a/mounts/shm
shm                      65536        68     65468   0% /mnt/data/docker/containers/f73b6c980b13d97d27fb314c6224f8ed0ef66be8e8039b69436273dc328925f5/mounts/shm
shm                      65536        68     65468   0% /var/lib/docker/containers/f73b6c980b13d97d27fb314c6224f8ed0ef66be8e8039b69436273dc328925f5/mounts/shm
shm                      65536        56     65480   0% /mnt/data/docker/containers/6cbb56501e93036eebc86c0c4bfc38a133d2a62ab458290f5005c592a572e7e3/mounts/shm
shm                      65536        56     65480   0% /var/lib/docker/containers/6cbb56501e93036eebc86c0c4bfc38a133d2a62ab458290f5005c592a572e7e3/mounts/shm
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/9737f2fb6b35435b15a046da2583412dd66e3559ab0fde67c9fd7c2e67336aa4/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/9737f2fb6b35435b15a046da2583412dd66e3559ab0fde67c9fd7c2e67336aa4/merged
shm                      65536         0     65536   0% /mnt/data/docker/containers/05216e59fa3a97c660472ae072c417cab1743086a3bb61f40ff167f2caaa03ba/mounts/shm
shm                      65536         0     65536   0% /var/lib/docker/containers/05216e59fa3a97c660472ae072c417cab1743086a3bb61f40ff167f2caaa03ba/mounts/shm
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/ce9f300e071672f49be4e085525ee1e19e381ff4e6bb869e12f5cad245018136/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/ce9f300e071672f49be4e085525ee1e19e381ff4e6bb869e12f5cad245018136/merged
shm                      65536        68     65468   0% /mnt/data/docker/containers/f3c374354b066023eef0662df5fee6b5112d69eba4e2c18daff4223e387d00fa/mounts/shm
shm                      65536        68     65468   0% /var/lib/docker/containers/f3c374354b066023eef0662df5fee6b5112d69eba4e2c18daff4223e387d00fa/mounts/shm
# cat /proc/swaps
Filename                                Type            Size    Used    Priority
/dev/zram0                              partition       974400  0       -2
# 

Logs dmesg output:

[    0.197545] usbcore: registered new interface driver usbfs
[    0.197606] usbcore: registered new interface driver hub
[    0.197718] usbcore: registered new device driver usb
[    0.487955] usbcore: registered new interface driver r8152
[    0.488025] usbcore: registered new interface driver lan78xx
[    0.488474] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1
[    0.495448] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 4.19
[    0.495481] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    0.495507] usb usb1: Product: xHCI Host Controller
[    0.495528] usb usb1: Manufacturer: Linux 4.19.88-v8 xhci-hcd
[    0.495549] usb usb1: SerialNumber: 0000:01:00.0
[    0.496073] hub 1-0:1.0: USB hub found
[    0.496671] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2
[    0.496706] xhci_hcd 0000:01:00.0: Host supports USB 3.0 SuperSpeed
[    0.497145] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 4.19
[    0.497175] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    0.497200] usb usb2: Product: xHCI Host Controller
[    0.497220] usb usb2: Manufacturer: Linux 4.19.88-v8 xhci-hcd
[    0.497242] usb usb2: SerialNumber: 0000:01:00.0
[    0.497711] hub 2-0:1.0: USB hub found
[    0.499380] usbcore: registered new interface driver uas
[    0.499478] usbcore: registered new interface driver usb-storage
[    0.499601] usbcore: registered new interface driver usbserial_generic
[    0.499647] usbserial: USB Serial support registered for generic
[    0.507353] usbcore: registered new interface driver usbhid
[    0.507361] usbhid: USB HID core driver
[    0.830300] usb 1-1: new high-speed USB device number 2 using xhci_hcd
[    0.980982] usb 1-1: New USB device found, idVendor=2109, idProduct=3431, bcdDevice= 4.20
[    0.981020] usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[    0.981041] usb 1-1: Product: USB2.0 Hub
[    0.982796] hub 1-1:1.0: USB hub found
[    1.278306] usb 1-1.3: new full-speed USB device number 3 using xhci_hcd
[    1.615672] usbcore: registered new interface driver brcmfmac
[    1.999992] usb 1-1.3: New USB device found, idVendor=0463, idProduct=ffff, bcdDevice= 0.01
[    2.000016] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    2.000027] usb 1-1.3: Product: 5E
[    2.000036] usb 1-1.3: Manufacturer: EATON
[    3.967565] hid-generic 0003:0463:FFFF.0001: hiddev96,hidraw0: USB HID v1.10 Device [EATON 5E] on usb-0000:01:00.0-1.3/input0
[  172.256228] xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.
[  172.272306] xhci_hcd 0000:01:00.0: Host halt failed, -110
[  172.272321] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
[  172.272369] xhci_hcd 0000:01:00.0: HC died; cleaning up [  172.272442] usb 1-1: USB disconnect, device number 2
[  172.272467] usb 1-1.3: USB disconnect, device number 3

Additional context This bug report was opened here because HassOS3.8 rus default raspbian Kernel. This issue was initially reported here: https://github.com/home-assistant/hassos/issues/526

USB cable was replaced to ensure that it was not faulty.

If more logs are needed, please let me know

timg236 commented 4 years ago

It might be worth trying different versions of the VL805 firmware. There’s a thread about it here https://www.raspberrypi.org/forums/viewtopic.php?t=260879

JANogueira commented 4 years ago

Thank you. I will give it a try and share the outcome.

JANogueira commented 4 years ago

Well, on raspbian all went well, but not in hassio. USB crash is still happening in hassio.

Before shutting down Raspbian I have ensured that the latest vl805 firmware was properly installed (using ´sudo vl805´ and have shown 0137ad).

Do you know if the EEPROM bin file deppends on any other optimization on the linux kernel side? or it should fix the USB behaviour by itself?

Thanks in advance.

JANogueira commented 4 years ago

Please, disregard my last post. I was doing it wrong while trying to flash the EEPROM.

Took additional actions and all good apparently. 3 hours have passed and all is stable. I will observe this for a couple of days more just to be sure, and share the outcome.

Many thanks once again!

JANogueira commented 4 years ago

Just to confirm that everything is still working perfectly. 0137ad solved the issue. USB devices are all stable and I had a small increase on the sdcard I/O.

hardwareadictos commented 4 years ago

Hey!! Having exactly the same issue on VL805 FW version: 000137ad

Are you still having issues? Can you provide more info about what you did to solve it?

hardwareadictos commented 4 years ago

Here some logs:

log_xhci_rpi4.txt

JANogueira commented 4 years ago

Hi @hardwareadictos,

It worked for about 1 week with VL805 FW 0137ad and then I am facing the same issue again. Not fully solved unfortunately...

hardwareadictos commented 4 years ago

I opened an issue for that because I don't know if we have the same issue (I suspect we do):

https://github.com/raspberrypi/linux/issues/3438

Don't doubt on participating. Thanks for answering 😊

hardwareadictos commented 4 years ago

This could explain some things: https://github.com/raspberrypi/linux/commit/c74b1b53254016fd83b580b8d49bb02d72ce4836

Dont know if that change is already implemented on last kernel...

pelwell commented 4 years ago

That patch allows for the possibility of loading VL805 firmware from system RAM rather than EEPROM, which has to be re-enabled after a PCI reset. The VPU firmware is not making use of that facility, so when it comes to your problem that commit is a red herring.

hardwareadictos commented 4 years ago

That patch allows for the possibility of loading VL805 firmware from system RAM rather than EEPROM, which has to be re-enabled after a PCI reset. The VPU firmware is not making use of that facility, so when it comes to your problem that commit is a red herring.

Im not an expert, that's why i said "could" and not "actually". Was also the only recent commit referencing to xHCI.

That issue isnt happening on previous releases.

What's you opinion on that?

Thanks in advance :)

pelwell commented 4 years ago

Perhaps the on-board voltage is a marginal - try adding over_voltage=1 to config.txt and rebooting.

hardwareadictos commented 4 years ago

Perhaps the on-board voltage is a marginal - try adding over_voltage=1 to config.txt and rebooting.

Thank you! Applied. Will test it some days and i will report back.

JANogueira commented 4 years ago

thank you @pelwell Also applied this to my hassio installation. I will test it as well for some days and report back.

Cheers

hardwareadictos commented 4 years ago

24 hours. No issues so far. Next report on Monday, but seems that was the issue...

hardwareadictos commented 4 years ago

One week without issues, i consider this issue fixed :)

JANogueira commented 4 years ago

I also confirm that after 1 week, all is ok. But not sure if over voltage should be the way to go. This is a workaround and not a fix imho.

hardwareadictos commented 4 years ago

Agree. It was working before that voltage related changes on kernel. Maybe someone can give more feedback about that. But for the moment all is working as always

JamesH65 commented 4 years ago

I also confirm that after 1 week, all is ok. But not sure if over voltage should be the way to go. This is a workaround and not a fix imho.

Indeed, I expect a future firmware release will fix this, if the most recent one hasn't already done so.

JANogueira commented 4 years ago

I am currently running on 0137ad

timg236 commented 4 years ago

The VideoCore firmware controls DVFS so the update would be there rather than the XHCI firmware. Might be worth trying rpi-update (assuming you are familiar with rpi-update / risks of using early firmware)

DrJohnM61 commented 4 years ago

I am having the same issue after updating to the latest release. If I disconnect the [powered] USB3 hub and boot and then connect the hub, I can see the disks for a few minutes (they show in 'blkid' and 'df' and I can also 'ls' the mount points and see my files). They then they disconnect. Looking at dmesg, the same error as others is being produced: [ 77.881799] xhci_hcd 0000:01:00.0: WARNING: Host System Error [ 82.887053] xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command. [ 82.887070] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 82.887236] xhci_hcd 0000:01:00.0: HC died; cleaning up [ 82.887275] xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.

I am currently running on 0137ad

I have tried over_voltage=1 (in fact I have tried a number of over_voltage numbers or 1,2 and 3) I have performed a rpi-update CPU/GPU is not overclocked

$ cat /proc/cpuinfo | tail -3 Revision : c03112 Serial : 10000000b9ae9a7d Model : Raspberry Pi 4 Model B Rev 1.2

$ cat /etc/os-release | head -4 PRETTY_NAME="Raspbian GNU/Linux 10 (buster)" NAME="Raspbian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)"

$ uname -a Linux PiHoleNAS 4.19.102-v7l+ #1295 SMP Thu Feb 6 15:49:36 GMT 2020 armv7l GNU/Linux

dmesg2.txt

I am an old DEC 10/TOPS-10, VAX/VMS and PDP11/RSX11M+ assembler programmer. Unix is rather new to me. I apologies in advance if I come over as a noob

popcornmix commented 4 years ago

@DrJohnM61 do you know what firmware/kernel you were on previously? Does force_turbo=1 help? Does switching to firmware from here help?

DrJohnM61 commented 4 years ago

@DrJohnM61 do you know what firmware/kernel you were on previously? Does force_turbo=1 help? Does switching to firmware from here help?

IIRC the prior version was the 5th Feb release

I tried the force_turbo=1 but does not seem to make any difference. [ 13.045392] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 13.045464] xhci_hcd 0000:01:00.0: HC died; cleaning up

Will try the firmware alternative in the morning (its late now here in London)

DrJohnM61 commented 4 years ago

@DrJohnM61 do you know what firmware/kernel you were on previously? Does force_turbo=1 help? Does switching to firmware from here help?

I copied the files that your link pointed to into the /boot directory. Not sure if I was supposed to do anything after that (I am a unix noob). Rebooted and the problem with USB 3 connected hub and devices is still there. I note that the 'name -a' showed no difference in firmware version, so I suspect I am missing a step.

Went back to page one of the blog that you direct me to and executed the 'sudo rpi-update 4b2c270' command. Rebooted and then power cycled and the problem has gone away, so clearly (IMHO), this seems to be an issue that has been introduced in a later firmware version.

I will note that somewhere between the 4b2c270 version and the latest, I could not boot the RP4 with the connected USB hub/disks and see them but would have to boot without the hub connected and then plug it in after the boot had completed. This change happened after implementing the firmware change that allowed overclocking. I had for some time looked at forums to find a combination of voltage and wait times to try and fix the issue. However, the latest version of the firmware (4.19.102-v7l+ #1295 SMP) just did not allow the external powered USB hub or disks to stably connect (and even if the USB HUB was connected after boot, the USB ports would successively shut down, causing problems with my RAID sets, until the whole HUB disconnected).

Here is a snip from the log showing the USB disconnects (post force_turbo setting): [ 14.305428] sd 2:0:0:0: [sdc] tag#20 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd [ 14.305433] sd 2:0:0:0: [sdc] tag#20 CDB: opcode=0x41 41 00 89 e4 81 00 00 03 00 00 [ 14.335885] xhci_hcd 0000:01:00.0: WARN Can't disable streams for endpoint 0x81, streams are being disabled already [ 14.337628] usb 2-1.4.3: USB disconnect, device number 6 [ 14.338786] sd 2:0:0:0: [sdc] Synchronizing SCSI cache [ 14.355410] print_req_error: I/O error, dev sdc, sector 2313453824 [ 14.375412] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd [ 14.375420] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00 [ 14.445403] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd [ 14.445412] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00 [ 14.455411] print_req_error: I/O error, dev sdc, sector 2313454592 [ 14.455425] print_req_error: I/O error, dev sdc, sector 2313454848 [ 14.455434] print_req_error: I/O error, dev sdc, sector 2313455616 [ 14.525404] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd [ 14.525412] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00 [ 14.595450] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd [ 14.595457] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00 [ 14.675454] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd [ 14.675461] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00 [ 14.745449] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd [ 14.745458] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00 [ 14.805398] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd [ 14.805405] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00 [ 14.855429] sd 2:0:0:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=0x07 driverbyte=0x00 [ 14.865412] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd [ 14.865419] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00 [ 14.925448] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd [ 14.925457] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00 [ 14.975986] xhci_hcd 0000:01:00.0: WARN Can't disable streams for endpoint 0x81, streams are being disabled already [ 14.977293] usb 2-1.4.4: USB disconnect, device number 7 [ 14.978401] sd 3:0:0:0: [sdd] Synchronizing SCSI cache [ 14.978422] print_req_error: I/O error, dev sdd, sector 2313453568 [ 15.075452] print_req_error: I/O error, dev sdd, sector 2313453824 [ 15.485545] sd 3:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=0x07 driverbyte=0x00 [ 15.605939] xhci_hcd 0000:01:00.0: WARN Can't disable streams for endpoint 0x81, streams are being disabled already [ 17.785614] Buffer I/O error on dev md0, logical block 0, lost sync page write [ 17.785622] EXT4-fs (md0): I/O error while writing superblock [ 17.866157] Buffer I/O error on dev md0, logical block 976690672, async page read [ 17.887111] Buffer I/O error on dev md0, logical block 976690672, async page read [ 18.037901] md0: detected capacity change from 4000525058048 to 0 [ 18.037951] md: md0 stopped. [ 18.225997] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.

Happy to try different firmware if someone can explain how to install it after pulling from a file share.

hardwareadictos commented 4 years ago

CPU/GPU is not overclocked

Maybe your Power supply isnt giving you enough power then

DrJohnM61 commented 4 years ago

CPU/GPU is not overclocked

Maybe your Power supply isnt giving you enough power then

The hub is powered, so there is 'no' drain for the external hard disks. The power supply for the Pi4 says that it is 5V/3A and as noted, it works fine with prior versions of the firmware.

I will swap out the power supply for the official raspberry pi version, upgrade the firmware and retry.

Thanks for the suggestion.

DrJohnM61 commented 4 years ago

Latests tests... Updated back to 4.19.97-v7l+ with swapped out power supply. Same issues - Hard disks are not mounted (for any length of time) with a xHCI host controller not responding, assume dead. Log attached.

system-fail-firmware.log

I have also upgraded to 4.19.102-v7l+ with no improvement.

Downgraded back to 4.19.83-v7l+ and everything is working again.

did a 'sudo apt-get install --reinstall raspberrypi-kernel' This took me back up to 4.19.97-v7l+

Reboot - everything now working OK! set: over_voltage=5 arm_freq=2140

Does not boot at this frequency. Changed arm_freq=2000 and boots fine and disks are visible.

So at this stage, I have no idea why the error has stopped being shown except I did a 'sudo apt-get install --reinstall raspberrypi-kernel'

JamesH65 commented 4 years ago

Can I confirm that the latest software (sudo apt update; sudo apt full-upgrade) is working fine at standard clocks? There have been changes in here to help with booting. Also, we cannot guarantee overclocks, so expecting 2140 to work is very ambitious- that is a huge overclock. Even 2000 is pretty big - 33% over stock.

My thoughts are that this issue has been solved (at standard clocks, and sensible overclocks) and can be closed, but just need to confirm.

hardwareadictos commented 4 years ago

We are missing the main goal of the issue. Stability issues with peripherals connected ....

No overclocks here

JamesH65 commented 4 years ago

We are missing the main goal of the issue. Stability issues with peripherals connected ....

No overclocks here

And various people has said they are fixed with the latest software. Just trying to confirm whether that is the case.

DrJohnM61 commented 4 years ago

My RP4b has been working fine for the last 24hrs and I have transferred over 3tb to my raid array without any issue, so closed for me. I will note, that the issue was being seen on 4.19.97-v7l+ without overvolt or overclock and just seemed to go away after several up/downgrades followed by a 'sudo apt-get install --reinstall raspberrypi-kernel'.

Thanks everyone for the assistance/advice.

hardwareadictos commented 4 years ago

I updated to last kernel 4.19.97-v7l+ and i commented over_voltage parameter on kernel config. No issues so far, will report back on 24 hours.

JANogueira commented 4 years ago

I will allow some more time so other people can report similar feedback before closing this.

SemMulder commented 4 years ago

I just encountered this again today on a fully up-to-date RPi 4. I added the suggested over_voltage=1 flag and that seems to have made it at least somewhat more stable. I am still getting the following logs for the attached HDD though:

Feb 17 21:06:20 raspberrypi kernel: sd 0:0:0:0: [sda] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb 17 21:06:20 raspberrypi kernel: sd 0:0:0:0: [sda] tag#7 Sense Key : 0xb [current] 
Feb 17 21:06:20 raspberrypi kernel: sd 0:0:0:0: [sda] tag#7 ASC=0x47 ASCQ=0x1 
Feb 17 21:06:20 raspberrypi kernel: sd 0:0:0:0: [sda] tag#7 CDB: opcode=0x28 28 00 2c 0c a3 10 00 04 00 00
Feb 17 21:06:20 raspberrypi kernel: print_req_error: I/O error, dev sda, sector 739025680
Feb 17 21:06:21 raspberrypi kernel: sd 0:0:0:0: [sda] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb 17 21:06:21 raspberrypi kernel: sd 0:0:0:0: [sda] tag#8 Sense Key : 0xb [current] 
Feb 17 21:06:21 raspberrypi kernel: sd 0:0:0:0: [sda] tag#8 ASC=0x47 ASCQ=0x1 
Feb 17 21:06:21 raspberrypi kernel: sd 0:0:0:0: [sda] tag#8 CDB: opcode=0x28 28 00 2c 0c a7 10 00 04 00 00
Feb 17 21:06:21 raspberrypi kernel: print_req_error: I/O error, dev sda, sector 739026704
Feb 17 21:08:55 raspberrypi kernel: sd 0:0:0:0: [sda] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb 17 21:08:55 raspberrypi kernel: sd 0:0:0:0: [sda] tag#24 Sense Key : 0xb [current] 
Feb 17 21:08:55 raspberrypi kernel: sd 0:0:0:0: [sda] tag#24 ASC=0x47 ASCQ=0x1 
Feb 17 21:08:55 raspberrypi kernel: sd 0:0:0:0: [sda] tag#24 CDB: opcode=0x28 28 00 15 4f 74 00 00 04 00 00
Feb 17 21:08:55 raspberrypi kernel: print_req_error: I/O error, dev sda, sector 357528576
Feb 17 21:08:56 raspberrypi kernel: sd 0:0:0:0: [sda] tag#23 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb 17 21:08:56 raspberrypi kernel: sd 0:0:0:0: [sda] tag#23 Sense Key : 0xb [current] 
Feb 17 21:08:56 raspberrypi kernel: sd 0:0:0:0: [sda] tag#23 ASC=0x47 ASCQ=0x1 
Feb 17 21:08:56 raspberrypi kernel: sd 0:0:0:0: [sda] tag#23 CDB: opcode=0x28 28 00 15 4f 78 00 00 04 00 00
Feb 17 21:08:56 raspberrypi kernel: print_req_error: I/O error, dev sda, sector 357529600

Will let it run and report back soon.

hardwareadictos commented 4 years ago

48 hours with latest kernel and without over_voltage overlay on kernel. Seems stable for the moment.

SemMulder commented 4 years ago

Still hitting it, even with over_voltage=1:

Feb 17 22:19:51 raspberrypi kernel: sd 0:0:0:0: [sda] tag#14 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb 17 22:19:51 raspberrypi kernel: sd 0:0:0:0: [sda] tag#14 Sense Key : 0xb [current] 
Feb 17 22:19:51 raspberrypi kernel: sd 0:0:0:0: [sda] tag#14 ASC=0x47 ASCQ=0x1 
Feb 17 22:19:51 raspberrypi kernel: sd 0:0:0:0: [sda] tag#14 CDB: opcode=0x28 28 00 2c 58 20 00 00 04 00 00
Feb 17 22:19:51 raspberrypi kernel: print_req_error: I/O error, dev sda, sector 743972864
Feb 17 22:19:52 raspberrypi kernel: sd 0:0:0:0: [sda] tag#13 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb 17 22:19:52 raspberrypi kernel: sd 0:0:0:0: [sda] tag#13 Sense Key : 0xb [current] 
Feb 17 22:19:52 raspberrypi kernel: sd 0:0:0:0: [sda] tag#13 ASC=0x47 ASCQ=0x1 
Feb 17 22:19:52 raspberrypi kernel: sd 0:0:0:0: [sda] tag#13 CDB: opcode=0x28 28 00 2c 58 24 00 00 04 00 00
Feb 17 22:19:52 raspberrypi kernel: print_req_error: I/O error, dev sda, sector 743973888
Feb 17 22:20:32 raspberrypi kernel: sd 0:0:0:0: [sda] tag#23 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb 17 22:20:32 raspberrypi kernel: sd 0:0:0:0: [sda] tag#23 Sense Key : 0xb [current] 
Feb 17 22:20:32 raspberrypi kernel: sd 0:0:0:0: [sda] tag#23 ASC=0x47 ASCQ=0x1 
Feb 17 22:20:32 raspberrypi kernel: sd 0:0:0:0: [sda] tag#23 CDB: opcode=0x28 28 00 2c 58 77 40 00 04 00 00
Feb 17 22:20:32 raspberrypi kernel: print_req_error: I/O error, dev sda, sector 743995200
Feb 17 22:20:33 raspberrypi kernel: sd 0:0:0:0: [sda] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb 17 22:20:33 raspberrypi kernel: sd 0:0:0:0: [sda] tag#24 Sense Key : 0xb [current] 
Feb 17 22:20:33 raspberrypi kernel: sd 0:0:0:0: [sda] tag#24 ASC=0x47 ASCQ=0x1 
Feb 17 22:20:33 raspberrypi kernel: sd 0:0:0:0: [sda] tag#24 CDB: opcode=0x28 28 00 2c 58 7b 40 00 04 00 00
Feb 17 22:20:33 raspberrypi kernel: print_req_error: I/O error, dev sda, sector 743996224
Feb 17 22:20:58 raspberrypi kernel: xhci_hcd 0000:01:00.0: WARNING: Host System Error
Feb 17 22:20:58 raspberrypi kernel: xhci_hcd 0000:01:00.0: Host halt failed, -110
Feb 17 22:21:22 raspberrypi kernel: sd 0:0:0:0: [sda] tag#25 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD IN 
Feb 17 22:21:22 raspberrypi kernel: sd 0:0:0:0: [sda] tag#25 CDB: opcode=0x28 28 00 15 63 b4 00 00 04 00 00
Feb 17 22:21:27 raspberrypi kernel: xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.
Feb 17 22:21:27 raspberrypi kernel: xhci_hcd 0000:01:00.0: Host halt failed, -110
Feb 17 22:21:27 raspberrypi kernel: xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
Feb 17 22:21:27 raspberrypi kernel: xhci_hcd 0000:01:00.0: HC died; cleaning up
Feb 17 22:21:27 raspberrypi kernel: sd 0:0:0:0: [sda] tag#24 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD 
Feb 17 22:21:27 raspberrypi kernel: usb 1-1: USB disconnect, device number 2
Feb 17 22:21:27 raspberrypi kernel: sd 0:0:0:0: [sda] tag#24 CDB: opcode=0x28 28 00 15 63 b0 00 00 04 00 00
Feb 17 22:21:27 raspberrypi kernel: usb 2-1: USB disconnect, device number 2
Feb 17 22:21:27 raspberrypi kernel: print_req_error: I/O error, dev sda, sector 0
Feb 17 22:21:27 raspberrypi kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Feb 17 22:21:27 raspberrypi kernel: sd 0:0:0:0: Device offlined - not ready after error recovery
Feb 17 22:21:27 raspberrypi kernel: sd 0:0:0:0: Device offlined - not ready after error recovery
Feb 17 22:21:27 raspberrypi kernel: print_req_error: I/O error, dev sda, sector 358855680
Feb 17 22:21:27 raspberrypi kernel: print_req_error: I/O error, dev sda, sector 358854656
popcornmix commented 4 years ago

@SemMulder what do:

vcgencmd version
vcgencmd get_config over_voltage_avs

report? Does over_voltage=2 help? Was this reliable with an older firmware?

SemMulder commented 4 years ago
pi@raspberrypi:~ $ vcgencmd version
Feb 12 2020 12:36:21 
Copyright (c) 2012 Broadcom
version c3c8dbdf147686fb0c3f32aece709d0653368810 (clean) (release) (start)
pi@raspberrypi:~ $ vcgencmd get_config over_voltage_avs
over_voltage_avs=-36250
pi@raspberrypi:~ $ uname -a
Linux raspberrypi 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux
pi@raspberrypi:~ $ sudo vl805 
VL805 FW version: 000137ad

Does over_voltage=2 help?

Will try that next together with sudo apt-get install --reinstall raspberrypi-kernel.

Was this reliable with an older firmware?

Not sure, since I just got this one. Did also have issues with underpowered HDMI though...

popcornmix commented 4 years ago

sudo rpi-update afbea380 would get you to a firmware/kernel before the DVFS changes. I would expect that would only help if over_voltage= helps for some value of n (1 seems to have been enough in other cases I've heard of, but you could try 2 or 3).

But the issue could be unrelated to that.

SemMulder commented 4 years ago

Is the output of the vcgencmd get_config over_voltage_avs command supposed to change depending on over_voltage=...? Because I am not getting any changes switching between not setting over_voltage and setting it to either 1 or 2.

The flag is supposed to be set in /boot/config.txt, correct?

popcornmix commented 4 years ago

No, over_voltage_avs is purely internal, based on measurements of the ring oscillators of the specific chip. It's not something the user can adjust, but it gives some info to me about the type of silicon we are dealing with.

SemMulder commented 4 years ago

Running with rpi-update afbea380 (i.e. Linux raspberrypi 4.19.83-v7l+) and no over_voltage flags I get the following:

Feb 18 18:15:57 raspberrypi kernel: usb usb2-port1: unable to enumerate USB device
Feb 18 18:16:08 raspberrypi kernel: usb 2-1: new SuperSpeed Gen 1 USB device number 23 using xhci_hcd
Feb 18 18:16:13 raspberrypi kernel: usb 2-1: device descriptor read/8, error -110
Feb 18 18:16:13 raspberrypi kernel: usb 2-1: new SuperSpeed Gen 1 USB device number 23 using xhci_hcd
Feb 18 18:16:18 raspberrypi kernel: usb 2-1: device descriptor read/8, error -110
Feb 18 18:16:19 raspberrypi kernel: usb 2-1: new SuperSpeed Gen 1 USB device number 24 using xhci_hcd
Feb 18 18:16:24 raspberrypi kernel: usb 2-1: device descriptor read/8, error -110
Feb 18 18:16:24 raspberrypi kernel: usb 2-1: new SuperSpeed Gen 1 USB device number 24 using xhci_hcd
Feb 18 18:16:29 raspberrypi kernel: usb 2-1: device descriptor read/8, error -110
Feb 18 18:16:29 raspberrypi kernel: usb usb2-port1: attempt power cycle
Feb 18 18:16:29 raspberrypi kernel: usb 2-1: new SuperSpeed Gen 1 USB device number 25 using xhci_hcd
Feb 18 18:16:35 raspberrypi kernel: usb 2-1: device descriptor read/8, error -110
Feb 18 18:16:35 raspberrypi kernel: usb 2-1: new SuperSpeed Gen 1 USB device number 25 using xhci_hcd
Feb 18 18:16:40 raspberrypi kernel: usb 2-1: device descriptor read/8, error -110
Feb 18 18:16:40 raspberrypi kernel: usb 2-1: new SuperSpeed Gen 1 USB device number 26 using xhci_hcd
Feb 18 18:16:45 raspberrypi kernel: usb 2-1: device descriptor read/8, error -110
Feb 18 18:16:45 raspberrypi kernel: usb 2-1: new SuperSpeed Gen 1 USB device number 26 using xhci_hcd
Feb 18 18:16:50 raspberrypi kernel: usb 2-1: device descriptor read/8, error -110

Error code 110 is ETIMEDOUT I believe. The behavior seems slightly better though since the xHCI host controller keeps running, but ultimately the USB connection is still lost.

Maybe the xHCI host controller not responding, assume dead error is because of a bug when handling underlying USB issues?

I am not sure if my issues are still related to the current ticket now. Should I open a new ticket? Is there anything I can try to narrow this down?

SemMulder commented 4 years ago

After researching my USB-SATA device:

Bus 002 Device 002: ID 152d:0578 JMicron Technology Corp. / JMicron USA Technology Corp. JMS567 SATA 6Gb/s bridge

I stumbled on the following: https://www.smartmontools.org/wiki/SAT-with-UAS-Linux. So I switched to the usb-storage driver now. So far this seems to be stable, but I will let it run for a while and report back.

SemMulder commented 4 years ago

Just FYI, this issue is no longer occurring after disabling uas.

Pillendreher commented 4 years ago

Seems like I have encountered something similar to this. I have an SSD attached to the Rasperry Pi 4 and have not encountered any problems writing to it or reading from it directly via the Pi. Yet sometimes the whole drive crashes when I access it remotely. It looks like sharing the SSD via SAMBA makes matters worse, yet I have encountered the same issue while transfering a file from the SSD to my PC as well (transfer rate drops and suddenly the drive is gone). On my latest crash the log told me this:

root@DietPi:~# dmesg -l err,crit,alert,emerg
[ 5335.445477] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
[ 5335.447503] xhci_hcd 0000:01:00.0: HC died; cleaning up
[ 5335.525443] print_req_error: I/O error, dev sda, sector 367394728
[ 5335.525497] print_req_error: I/O error, dev sda, sector 367390632
[ 5335.525537] print_req_error: I/O error, dev sda, sector 367387560
[ 5335.525576] print_req_error: I/O error, dev sda, sector 367388584
[ 5335.525615] print_req_error: I/O error, dev sda, sector 367392680
[ 5335.525651] print_req_error: I/O error, dev sda, sector 361440792
[ 5335.525660] print_req_error: I/O error, dev sda, sector 361439768
[ 5335.525673] print_req_error: I/O error, dev sda, sector 367396776
[ 5335.525686] print_req_error: I/O error, dev sda, sector 361455128
[ 5335.525701] print_req_error: I/O error, dev sda, sector 361456152
[ 5335.533666] XFS (sda1): writeback error on sector 361437720
[ 5335.537049] XFS (sda1): writeback error on sector 361486872
[ 5335.539202] XFS (sda1): writeback error on sector 361462296
[ 5335.539232] XFS (sda1): metadata I/O error in "xlog_iodone" at daddr 0xee7b080 len 64 error 5
[ 5335.539302] XFS (sda1): Log I/O Error Detected.  Shutting down filesystem
[ 5335.539313] XFS (sda1): Please umount the filesystem and rectify the problem(s)
[ 5335.583470] XFS (sda1): writeback error on sector 330838608

Both the Raspberry Pi and the SSD care are brand new. The SSD isn't, but running smartctl on it tells me there are neither "critical warnings" nor "media and data integrity errors".

hardwareadictos commented 4 years ago

Seems like I have encountered something similar to this. I have an SSD attached to the Rasperry Pi 4 and have not encountered any problems writing to it or reading from it directly via the Pi. Yet sometimes the whole drive crashes when I access it remotely. It looks like sharing the SSD via SAMBA makes matters worse, yet I have encountered the same issue while transfering a file from the SSD to my PC as well (transfer rate drops and suddenly the drive is gone). On my latest crash the log told me this:

root@DietPi:~# dmesg -l err,crit,alert,emerg
[ 5335.445477] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
[ 5335.447503] xhci_hcd 0000:01:00.0: HC died; cleaning up
[ 5335.525443] print_req_error: I/O error, dev sda, sector 367394728
[ 5335.525497] print_req_error: I/O error, dev sda, sector 367390632
[ 5335.525537] print_req_error: I/O error, dev sda, sector 367387560
[ 5335.525576] print_req_error: I/O error, dev sda, sector 367388584
[ 5335.525615] print_req_error: I/O error, dev sda, sector 367392680
[ 5335.525651] print_req_error: I/O error, dev sda, sector 361440792
[ 5335.525660] print_req_error: I/O error, dev sda, sector 361439768
[ 5335.525673] print_req_error: I/O error, dev sda, sector 367396776
[ 5335.525686] print_req_error: I/O error, dev sda, sector 361455128
[ 5335.525701] print_req_error: I/O error, dev sda, sector 361456152
[ 5335.533666] XFS (sda1): writeback error on sector 361437720
[ 5335.537049] XFS (sda1): writeback error on sector 361486872
[ 5335.539202] XFS (sda1): writeback error on sector 361462296
[ 5335.539232] XFS (sda1): metadata I/O error in "xlog_iodone" at daddr 0xee7b080 len 64 error 5
[ 5335.539302] XFS (sda1): Log I/O Error Detected.  Shutting down filesystem
[ 5335.539313] XFS (sda1): Please umount the filesystem and rectify the problem(s)
[ 5335.583470] XFS (sda1): writeback error on sector 330838608

Both the Raspberry Pi and the SSD care are brand new. The SSD isn't, but running smartctl on it tells me there are neither "critical warnings" nor "media and data integrity errors".

Did you try to blacklist uas for this drive via quirks on cmdline.txt?

Pillendreher commented 4 years ago

Seems like I have encountered something similar to this. I have an SSD attached to the Rasperry Pi 4 and have not encountered any problems writing to it or reading from it directly via the Pi. Yet sometimes the whole drive crashes when I access it remotely. It looks like sharing the SSD via SAMBA makes matters worse, yet I have encountered the same issue while transfering a file from the SSD to my PC as well (transfer rate drops and suddenly the drive is gone). On my latest crash the log told me this:

root@DietPi:~# dmesg -l err,crit,alert,emerg
[ 5335.445477] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
[ 5335.447503] xhci_hcd 0000:01:00.0: HC died; cleaning up
[ 5335.525443] print_req_error: I/O error, dev sda, sector 367394728
[ 5335.525497] print_req_error: I/O error, dev sda, sector 367390632
[ 5335.525537] print_req_error: I/O error, dev sda, sector 367387560
[ 5335.525576] print_req_error: I/O error, dev sda, sector 367388584
[ 5335.525615] print_req_error: I/O error, dev sda, sector 367392680
[ 5335.525651] print_req_error: I/O error, dev sda, sector 361440792
[ 5335.525660] print_req_error: I/O error, dev sda, sector 361439768
[ 5335.525673] print_req_error: I/O error, dev sda, sector 367396776
[ 5335.525686] print_req_error: I/O error, dev sda, sector 361455128
[ 5335.525701] print_req_error: I/O error, dev sda, sector 361456152
[ 5335.533666] XFS (sda1): writeback error on sector 361437720
[ 5335.537049] XFS (sda1): writeback error on sector 361486872
[ 5335.539202] XFS (sda1): writeback error on sector 361462296
[ 5335.539232] XFS (sda1): metadata I/O error in "xlog_iodone" at daddr 0xee7b080 len 64 error 5
[ 5335.539302] XFS (sda1): Log I/O Error Detected.  Shutting down filesystem
[ 5335.539313] XFS (sda1): Please umount the filesystem and rectify the problem(s)
[ 5335.583470] XFS (sda1): writeback error on sector 330838608

Both the Raspberry Pi and the SSD care are brand new. The SSD isn't, but running smartctl on it tells me there are neither "critical warnings" nor "media and data integrity errors".

Did you try to blacklist uas for this drive via quirks on cmdline.txt?

Not yet, I have just started doing some research on the whole thing. I will give that a try, thank you for pointing it out!

Pillendreher commented 4 years ago

Just a quick follow-up: Since moving away from uas, I have not encountered a single error. Thanks for the advice!