home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
5k stars 980 forks source link

USB attached NVMe boot fails on RPi4 #796

Closed agners closed 4 years ago

agners commented 4 years ago

HassOS release with the issue: 5.0

Description of problem: Boot from USB 3.1 Gen 1 attached NVMe device fails (ICY BOX enclosure).

Most of the time it seems that the RPi 4 boot loader is not even able to load U-Boot from the USB drive. By replugging the device I was able to get into U-Boot, but then U-Boot failed to enumerate the device as a storage device:

U-Boo 2020.07 (Jul 25 2020 - 06:55:15 +0000)

DRAM:  1.9 GiB
RPI 4 Model B (0xb03112)
MMC:   mmcnr@7e300000: 1, emmc2@7e340000: 0
In:    serial
Out:   serial
Err:   serial
Net:   eth0: genet@7d580000
PCIe BRCM: link up, 5.0 Gbps x1 (SSC)
startig USB...
Bus xhci_pci: Register 5000420 NbrPorts 5
Starting the controller
USB XHCI 1.00
scanning bus xhci_pci for devces... Device NOT ready
   Request Sense returned 02 04 01
4 USB Device(s) found
       scanning usb for storage devices... 0 Sorage Device(s) found
Card did not respond to voltage select!
Card did not respond to voltage select!

genet@7d580000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
...

Manually starting the reenumeration process did help in that case:

usb reset
...
run bootcmd

The device enumerates with PID/VID 0x152d/0x0580

[304049.433123] usb 4-1: USB disconnect, device number 4
[304051.116509] usb 4-1: new SuperSpeed Gen 1 USB device number 5 using xhci_hcd
[304051.134414] usb 4-1: New USB device found, idVendor=152d, idProduct=0580, bcdDevice=31.03
[304051.134418] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[304051.134420] usb 4-1: Product: USB 3.1 Storage Device
[304051.134421] usb 4-1: Manufacturer: JMicron

RPi 4 bootloader version:

pi@raspberrypi:~$ sudo rpi-eeprom-update 
BCM2711 detected
Dedicated VL805 EEPROM detected
BOOTLOADER: up-to-date
CURRENT: Mon 15 Jun 13:36:19 UTC 2020 (1592228179)
 LATEST: Thu 16 Apr 17:11:26 UTC 2020 (1587057086)
 FW DIR: /lib/firmware/raspberrypi/bootloader/critical
VL805: up-to-date
CURRENT: 000137ad
 LATEST: 000137ad
agners commented 4 years ago

Especially since the RPi4 boot loader already fails to load U-Boot we probably can't do much here at the moment. But its probably worth keeping an issue open to collect known working/non-working devices.

agners commented 4 years ago

I think I was able to fix the early boot issue. After updating the EEPROM and using firmware from the master branch as of today, I am able to boot from the USB drive every time.

Current EEPROM versions:

pi@raspberrypi:~$ sudo rpi-eeprom-update
BCM2711 detected
Dedicated VL805 EEPROM detected
BOOTLOADER: up-to-date
CURRENT: Thu 16 Jul 15:15:46 UTC 2020 (1594912546)
 LATEST: Thu 16 Apr 17:11:26 UTC 2020 (1587057086)
 FW DIR: /lib/firmware/raspberrypi/bootloader/critical
VL805: up-to-date
CURRENT: 000138a1
 LATEST: 000137ad

However, U-Boot still seems to fail to enumerate the device on first try.

U-Boo 2020.07 (Jul 25 2020 - 06:55:15 +0000)

DRAM:  1.9 GiB
RPI 4 Model B (0xb03112)
MMC:   mmcnr@7e300000: 1, emmc2@7e340000: 0
In:    serial
Out:   serial
Err:   serial
Net:   eth0: genet@7d580000
PCIe BRCM: link up, 5.0 Gbps x1 (SSC)
startig USB...
Bus xhci_pci: Register 5000420 NbrPorts 5
Starting the controller
USB XHCI 1.00
scanning bus xhci_pci for devces... Device NOT ready
   Request Sense returned 02 04 01
4 USB Device(s) found
       scanning usb for storage devices... 0 Sorage Device(s) found
Card did not respond to voltage select!
Card did not respond to voltage select!

genet@7d580000 Waiting for PHY auto negotiation to complete... done
BOOTP broadcast 1
DHCP client bound to address 192.168.80.226 (3 ms
====> Ctrl+C
HassOS> <INTERRUPT>
HassOS> usb tree
USB device tree:
  1  Hub (5 Gb/s, 0mA)
  |  U-Boot XHCI Host Controller
  |
  +-2  MassStorage (5 Gb/s, 224mA)
  |    JMicron USB 3.1 Storage Device DD56419884B3E
  |
  +-3  Hub (480 Mb/s, 100A)
    |   USB2.0 Hub
    |
    +-4  Human Interface (12 Mb/s, 98mA)
         Logitech USB Receiver

HassOS> ls usb 0:1
HassOS> usb reset
resetting USB...
Bus xhci_pci: Register 5000420 NbrPorts 5
Starting the controller
USB XHCI 1.00
scanning bus xhci_ci for devices... 4 USB Device(s) found
       scanning usb for storage devices... 1 Storage Device(s) found
HassOS>ls usb 0:1
    41499   bcm2711-rpi-4-b.dtb
     2382   boot.scr
       57   cmdline.txt
     1785   config.txt
           overlays/
   488304   u-boot.bin
  2272992   start4.elf.orig
  2277376   start4.elf
     5405   fixup4.dat.orig
     5407   fiup4.dat

9 file(s), 1 dir(s)

HassOS> usb tree
USB device tree:
  1  Hub (5 Gb/s, 0mA)
  |  U-Boot XHCI Host Controller
  |
  |+-2  Mass Storage (5 Gb/s, 224mA)
  |    JMicron USB 3.1 Storage Device DD56419884B3E
  |
  +-3  Hub (480 b/s, 100mA)
    |   USB2.0 Hub
    |
    +-4  Human Interface (12 Mb/s, 98mA)
         Logitech USB Receier
agners commented 4 years ago

Shortly after boot I then got this kind of messages:

...
[  115.691139] sd 0:0:0:0: [sda] tag#19 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
[  115.702260] sd 0:0:0:0: [sda] tag#19 CDB: opcode=0x28 28 00 01 55 c8 00 00 00 08 00
[  115.727145] scsi host0: uas_eh_device_reset_handler start
[  115.864065] usb 2-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
[  115.897891] scsi host0: uas_eh_device_reset_handler success
[  150.507719] sd 0:0:0:0: [sda] tag#28 uas_eh_abort_handler 0 uas-tag 23 inflight: CMD OUT
[  150.518493] sd 0:0:0:0: [sda] tag#28 CDB: opcode=0x2a 2a 00 00 26 9f 28 00 04 00 00
[  150.529911] sd 0:0:0:0: [sda] tag#27 uas_eh_abort_handler 0 uas-tag 22 inflight: CMD OUT
[  150.540001] sd 0:0:0:0: [sda] tag#27 CDB: opcode=0x2a 2a 00 00 26 9b 28 00 04 00 00
[  150.550751] sd 0:0:0:0: [sda] tag#26 uas_eh_abort_handler 0 uas-tag 21 inflight: CMD OUT
[  150.560822] sd 0:0:0:0: [sda] tag#26 CDB: opcode=0x2a 2a 00 00 9b 4b 18 00 01 c0 00
[  150.571586] sd 0:0:0:0: [sda] tag#25 uas_eh_abort_handler 0 uas-tag 20 inflight: CMD OUT
[  150.581833] sd 0:0:0:0: [sda] tag#25 CDB: opcode=0x2a 2a 00 00 9b 48 60 00 02 b8 00
[  150.592805] sd 0:0:0:0: [sda] tag#24 uas_eh_abort_handler 0 uas-tag 19 inflight: CMD OUT
[  150.602896] sd 0:0:0:0: [sda] tag#24 CDB: opcode=0x2a 2a 00 00 26 97 28 00 04 00 00
[  150.612694] sd 0:0:0:0: [sda] tag#23 uas_eh_abort_handler 0 uas-tag 18 inflight: CMD OUT
[  150.622777] sd 0:0:0:0: [sda] tag#23 CDB: opcode=0x2a 2a 00 00 9b 46 98 00 01 c8 00
[  150.632621] sd 0:0:0:0: [sda] tag#22 uas_eh_abort_handler 0 uas-tag 17 inflight: CMD OUT
[  150.642680] sd 0:0:0:0: [sda] tag#22 CDB: opcode=0x2a 2a 00 00 9b 44 10 00 02 88 00
[  150.652510] sd 0:0:0:0: [sda] tag#21 uas_eh_abort_handler 0 uas-tag 16 inflight: CMD OUT
[  150.662571] sd 0:0:0:0: [sda] tag#21 CDB: opcode=0x2a 2a 00 00 9b 40 10 00 04 00 00
...

@cogneato made me aware of this RPi thread: https://www.raspberrypi.org/forums/viewtopic.php?t=245931.

And indeed adding the VID/PID of my USB storage device to the quirk list by adding usb-storage.quirks=152d:0580:u to cmdline.txt seems to fix the above messages in my case as well.

caylor93 commented 4 years ago

The 5.1 release seems to have resolved some of the USB to NVMe compatibility issues. I had tried two different USB 3.0 UASP capable adapters (SSK's SHE-C325 and Plugable's USBC-NVME) and both wouldn't boot with a Samsung 970 EVO 500GB (MZ-V7E500BW) SSD. The Plugable adapter will now boot (though it acted weird at first...throwing a website 500 error as it lost the USB drive, then when I rebooted it to solve that issue, it said it was unable to find the USB SSD 60+ times before booting). The SSK device will still not boot. Stops at the "Read start4.elf" process.

jperquin commented 4 years ago

Started with a fresh SSD install of hassos_rpi4-64-5.1.img which booted right up the first time and appears to be running stable.

Two issues though: Host log (in Supervisor / System) keeps posting following message every second:

[ 2815.232456] mmc0: Timeout waiting for hardware cmd interrupt. [ 2815.232490] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 2815.232505] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 2815.232518] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 2815.232530] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 2815.232542] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 2815.232554] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 2815.232565] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 2815.232576] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 2815.232587] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 2815.232598] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 2815.232609] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 2815.232620] mmc0: sdhci: Cmd: 0x0000371a | Max curr: 0x00080008

[ 2815.232653] mmc0: sdhci: Host ctl2: 0x00000000 [ 2815.232663] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 2815.232674] mmc0: sdhci: ============================================

Furthermore in Supervisor / Hardware I am not seeing any disks (like /dev/sda).

Any insight is much appreciated.

agners commented 4 years ago

@jperquin to get rid of the mmc0 Timeouts you have to add sd_poll_once=on to dtparam in config.txt (coma separated).

See: https://github.com/home-assistant/operating-system/pull/776

jperquin commented 4 years ago

like so?

# Enable audio (loads snd_bcm2835)
dtparam=audio=on,sd_poll_once=on

EDIT: It worked by SSH-ing into the pi (on port 22222), login, cd /mnt/boot and edit config.txt with vi. Thanks.

However still not seeing any disks enumerated under Supervisor / Hardware:

image

agners commented 4 years ago

@jperquin hm, from what I can tell this is the list of block devices provided by udev, so I guess udev doesn't see/consider the USB drive as block device. What USB drive are you using now?

jperquin commented 4 years ago

@agners using the ASMT/ASM105x USB to SATA dongle with a Crucial BX500 Solid State Drive, 120 GB SSD, 3D NAND, SATA

jperquin commented 4 years ago

Any thoughts?

jumperalex commented 4 years ago

I'm not sure if this helps differentiate at all (I can give more technical details if told how), but I have the SSK enclosure with a Transcend TS128GMTE110S NVME. When loaded with Raspian and plugged into either USB3 port with a USB3 cable it boots every time over many tries.

But when I put HassOS on there it won't boot and just goes to a blank screen. If I use either the USB2 port, or a USB2 cable in the USB3 port it boots every time over many tries.

So the SSK and Transcend combo will boot on an RPi4; but there's something with how HassOS is booting through USB3 that just doesn't work. I tried going back and forth many times. I even copied over the relevant .dat and .elf files from the working Raspian load with no joy.

I tried adding quirks with no effect even though running the benchmark found at https://storage.jamesachambers.com/ showed a solid score (7850) indicting there were no UAS problems. I tried quirks first with a Raspian load so I could validate I knew how to apply it correctly, and then with HassOS but like I mentioned, it didn't help.

PS: I'm running the 15 Jun EEPROM version. I'm willing to try upping to the July version if that's really a possible fix.

xs4free commented 4 years ago

I'm experiencing the same problem as @jumperalex. I have a Raspberry PI4 4GB, with a Kingston A2000 NVME and a NVME-to-USB3 adapter based on the JMicron JMS583 chipset connected to the USB3 (blue) port of my RPI4. I followed the How to Boot Raspberry Pi 4 From a USB SSD or Flash Drive to update my PI to the latest bootloader and tested booting from SSD with Raspbian Lite. After that I used BalenaEtcher to write hassos_rpi4-64-5.1.img to the SSD. During the power-up of the PI, the initial 'bios' from the PI is briefly visible, but after that the screen goes blank and there is no more activity on the SSD. I've tried waiting more then an hour, I've tried using the older hassos_rpi4-64-5.0.img image, but it won't boot. The only thing that does work is moving the adapter to the USB2 ports of my PI4, after that everything works fine.

I would love to help and diagnose this error, but I have no Linux experience and can't seem to find any documentation on how to diagnose this problem.

rah66 commented 4 years ago

The device enumerates with PID/VID 0x152d/0x0580

its USB-SATA bridge JMicron JMS580, your box for M.2 SATA. USB-NVME bridge its JMS583 https://www.jmicron.com/products/list/1

carras commented 4 years ago

The device enumerates with PID/VID 0x152d/0x0580

its USB-SATA bridge JMicron JMS580, your box for M.2 SATA. USB-NVME bridge its JMS583 https://www.jmicron.com/products/list/1

And do you think the SATA version JMS580 actually works? I’m considering order one adaptor with that chip. (Off topic cuz here it’s about NVMe version)

agners commented 4 years ago

@jumperalex do you know what chipset your enclosure is using?

@xs4free HAOS likely hangs in U-Boot, as it did in my case. You can see the U-Boot output when using a UART to USB cable connected to the primary UART on pin 8/10.

I did look a bit more into the JMS583 issue I am seeing here, and it seems to be power related:

When booting after manually re-scanning USB, once in Linux, it lead to quite regular usb 2-1.1.2: reset SuperSpeed Gen 1 USB device number 4 using xhci_hcd messages. This was despite using usb-storage.quirks=152d:0580:u to disable UAS. The drive does work fine on my x86 machine, even with UAS disabled.

Then I also realized that enumeration in U-Boot failed when directly connected but did work when connecting over a powered USB 3.0 hub. With the powered USB 3.0 Hub booting worked every time, and also Linux was happy. This is with the HAOS 5.2 release.

I do use an original Raspberry Pi4 power supply! So it really seems that the JMS583 is so power hungry, that this is not enough :-(

But I think with this we can consider JMS583 working, when using enough power and the USB storage quirk.

jumperalex commented 4 years ago

@agners I'm 99% sure mine is using the JMS583 as well.

I'm a bit unsure that it is a power issue. It boots with Raspian with no problem and only HAOS is a problem.

I tried again with 5.2 and had the same issues.

Is this really closed? Needing to use a USB3 power hub seems pointless (yes I know, I can buy a new enclosure) especially given that the enclosure does work with Raspian and does quite well on a speed test.

EDIT: For fun I've just ordered a Realtek RTL9210 based enclosure (Plugable brand). Let's see if that makes a difference.

EDIT2: Also the record, I tried quirks and it doesn't make a difference on 5.1 and I didn't need quirks on Raspian.

netscruff commented 4 years ago

I seeing the same as @jumperalex. Raspberry Pi OS, no issues booting from usb. Tried with 5.2 and black screen. Tried quirks as well. It does boot if plugged into a USB 2.0 port.

I'm using the TDBT enclosure with JMS583 chipset. I think it matters less but I'm using a XPG SX8200 512GB NVMe drive in the enclosure.

jumperalex commented 4 years ago

Okay so I guess I'll agree it's a "power issue" in that using a powered usb3 hub I did get to boot. But I mean, really, its a power + HAOS problem because Raspian can in fact boot w/o the hub. Makes me wonder if it's a timing / race condition on the USB resetting?

I guess once I have the new Realtek enclosure tomorrow I'll at least be able to boot to HAOS without the hub. Hopefully.

jumperalex commented 4 years ago

AAaaan the Pluggable, Realtek based enclosure, does indeed boot my Transcend NVME SSD plugged directly into my RPi4, no powered hub needed.

I'm not happy about it ;-) but I'll take it.

ogarza commented 4 years ago

I have a RPi4, with the pluggable, realtek RTL9210 controller enclosure and an Intaiell 980plus 120gb NVME (SMI2263XT controller). I tried both types of USB connectors, neither works... however, both work booting into raspbian without problems.

I get the same black screen after briefly seeing the bios on both stable 4.15 and development 5.4

jumperalex commented 4 years ago

I have a RPi4, with the pluggable, realtek RTL9210 controller enclosure and an Intaiell 980plus 120gb NVME (SMI2263XT controller). I tried both types of USB connectors, neither works... however, both work booting into raspbian without problems.

I get the same black screen after briefly seeing the bios on both stable 4.15 and development 5.4

Are you using the 64-bit version of 5.4? Pretty sure that's required. It's what I'm using at least.

agners commented 4 years ago

@ogarza 4.x series does not support USB SSD boot. 5.4 unfortunately uses a buggy RPi firmware. But 5.3 should work fine.