skiffos / SkiffOS

Any Linux distribution, anywhere.
https://skiffos.com
MIT License
685 stars 50 forks source link

odroid hc1: SSD not seen after soft reboot #278

Closed danyer closed 1 year ago

danyer commented 1 year ago

As the title says, on Odroid HC1 the SSD is not visible after the board is rebooted via reboot command.

If it is powered off via poweroff and the power brick is physically removed from the socket and reinserted, the board boots again and sees the SSD. Since I want to install this somewhere remote, the manual power workaround doesn't seem feasible.

The SATA bridge firmware is updated to the latest version (the one from 2019) and the issue does not happen if using Armbian (kernel 5.4.230) while using it for over one year.

Maybe a kernel patch for 6.1 is missing?

During search, a similar issue was found here: https://forum.armbian.com/topic/15611-odroid-hc1-and-kernel-54-soft-reset-makes-the-sata-drive-disappear/ with the same error message "Unit not ready".

I am planing to use the "persist" partition on SSD instead of the "persist" partition on the SD card. It is pretty easy to do this since the mount-all.ssh script looks for LABEL=persist so I just need to mislabel the partition on SD card and apply the correct label to the (one and only) partition on the SSD.

One workaround would be to power off SATA bridge after the SSD is unmounted (during reboot) and hopefully this will work. I found different ways to achieve it (in theory), but I did not try any.

paralin commented 1 year ago

I have seen a similar issue with the odroid hc4 and hc2. I think it's just a general bug with the odroid kernel.

The odroid kernel version used in armbian is very out of date and based on the upstream hardkenrel kernel.

The one we use is the tobetter fork which is up to date but might have a couple issues like this one.

I'll update the odroid kernel to the latest possible version today and we can check if that resolves the issue (hopefully).

danyer commented 1 year ago

Unfortunately the latest kernel does not fix the issue.

Armbian kernel (or should I say hardkernel kernel) is 5.4, still supported until December 2025. They publish updates, for example latest 5.4 is 5.4.238 and they have 5.4.230. But yes, it's old! With a newer kernel the board runs much cooler.

Initially I thought that 6.2 fixed the issue because after updating the board (running 6.1) with the new image with 6.2 using push_image.bash and soft rebooting, the sda1 partition remained visible. But then, when I soft rebooted 6.2, the sda1 partition was not visible anymore.

Some logs, where you can see the partition (sda1) disappearing after soft reboot and the disk itself changing from a 60Gb disk to a 0 sized disk.

dan@elite:/mnt/storage/extra/SkiffOS$ ssh root@skiffos.lan
root@skiffos:~# uname -a
Linux skiffos 6.2.8 #1 SMP PREEMPT Fri Mar 24 23:21:02 CET 2023 armv7l GNU/Linux
root@skiffos:~# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0 40.8M  0 loop /usr/lib/modules
sda           8:0    0 59.6G  0 disk 
`-sda1        8:1    0 59.6G  0 part 
mmcblk1     179:0    0 29.7G  0 disk 
|-mmcblk1p1 179:1    0  308M  0 part 
|-mmcblk1p2 179:2    0  290M  0 part /mnt/rootfs
`-mmcblk1p3 179:3    0 29.1G  0 part /etc/ssh
                                     /var/log/journal
                                     /mnt/persist
zram0       254:0    0    2G  0 disk [SWAP]
root@skiffos:~# reboot     
Connection to skiffos.lan closed by remote host.
Connection to skiffos.lan closed.
dan@elite:/mnt/storage/extra/SkiffOS$ ssh root@skiffos.lan
root@skiffos:~# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0 40.8M  0 loop /usr/lib/modules
sda           8:0    0    0B  0 disk 
mmcblk1     179:0    0 29.7G  0 disk 
|-mmcblk1p1 179:1    0  308M  0 part 
|-mmcblk1p2 179:2    0  290M  0 part /mnt/rootfs
`-mmcblk1p3 179:3    0 29.1G  0 part /etc/ssh
                                     /var/log/journal
                                     /mnt/persist
zram0       254:0    0    2G  0 disk [SWAP]
root@skiffos:~# dmesg | grep sda
[   14.742754] sd 0:0:0:0: [sda] Unit Not Ready
[   14.745705] sd 0:0:0:0: [sda] Sense Key : 0x4 [current] 
[   14.750937] sd 0:0:0:0: [sda] ASC=0x44 <<vendor>>ASCQ=0x81 
[   14.751934] sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
[   14.770916] sd 0:0:0:0: [sda] Sense Key : 0x4 [current] 
[   14.776229] sd 0:0:0:0: [sda] ASC=0x44 <<vendor>>ASCQ=0x81 
[   14.782501] sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
[   14.790648] sd 0:0:0:0: [sda] Sense Key : 0x4 [current] 
[   14.795881] sd 0:0:0:0: [sda] ASC=0x44 <<vendor>>ASCQ=0x81 
[   14.801476] sd 0:0:0:0: [sda] 0 512-byte logical blocks: (0 B/0 B)
[   14.807607] sd 0:0:0:0: [sda] 0-byte physical blocks
[   14.813618] sd 0:0:0:0: [sda] Test WP failed, assume Write Enabled
[   14.818968] sd 0:0:0:0: [sda] Asking for cache data failed
[   14.824146] sd 0:0:0:0: [sda] Assuming drive cache: write through
[   14.831030] sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes not a multiple of physical block size (0 bytes)
[   14.840642] sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (0 bytes)
[   14.852965] sd 0:0:0:0: [sda] Attached SCSI disk
root@skiffos:~# poweroff
Connection to skiffos.lan closed by remote host.
Connection to skiffos.lan closed.
dan@elite:/mnt/storage/extra/SkiffOS$ ssh root@skiffos.lan
root@skiffos:~# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0 40.8M  0 loop /usr/lib/modules
sda           8:0    0 59.6G  0 disk 
`-sda1        8:1    0 59.6G  0 part 
mmcblk1     179:0    0 29.7G  0 disk 
|-mmcblk1p1 179:1    0  308M  0 part 
|-mmcblk1p2 179:2    0  290M  0 part /mnt/rootfs
`-mmcblk1p3 179:3    0 29.1G  0 part /etc/ssh
                                     /var/log/journal
                                     /mnt/persist
zram0       254:0    0    2G  0 disk [SWAP]
root@skiffos:~# reboot
Connection to skiffos.lan closed by remote host.
Connection to skiffos.lan closed.
dan@elite:/mnt/storage/extra/SkiffOS$ ssh root@skiffos.lan
root@skiffos:~# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0 40.8M  0 loop /usr/lib/modules
sda           8:0    0    0B  0 disk 
mmcblk1     179:0    0 29.7G  0 disk 
|-mmcblk1p1 179:1    0  308M  0 part 
|-mmcblk1p2 179:2    0  290M  0 part /mnt/rootfs
`-mmcblk1p3 179:3    0 29.1G  0 part /etc/ssh
                                     /var/log/journal
                                     /mnt/persist
zram0       254:0    0    2G  0 disk [SWAP]
root@skiffos:~# 
paralin commented 1 year ago

@danyer asked on forums and issue boards, let's see if anyone has any ideas on how to fix.

paralin commented 1 year ago

@danyer Could you please provide the lsusb and lsusb -t outputs just before and after rebooting (when it's working and then when its not)?

Thanks!

danyer commented 1 year ago

Hi @paralin , please find below the requested output.

Meanwhile I've switched my armbian setup from current kernel (5.4) to edge kernel (6.1) and the issue does not reproduce there. I've tried to have a look in order to see if 6.1 that armbian uses still has patches from odroid/tobetter or if it is mainline kernel, but I was not able to find (I am not familiar with Armbian's building infrastructure).

LE: changed "does not reproduce anymore" with "does not reproduce there", because I've never seen it on Armbian.

root@skiffos-b9cfbdb4:~# uname -a
Linux skiffos-b9cfbdb4 6.2.8 #1 SMP PREEMPT Fri Mar 24 23:21:02 CET 2023 armv7l GNU/Linux
root@skiffos-b9cfbdb4:~# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0 40.8M  0 loop /usr/lib/modules
sda           8:0    0 59.6G  0 disk 
`-sda1        8:1    0 59.6G  0 part 
mmcblk0     179:0    0 29.8G  0 disk 
|-mmcblk0p1 179:1    0  308M  0 part 
|-mmcblk0p2 179:2    0  290M  0 part /mnt/rootfs
`-mmcblk0p3 179:3    0 29.2G  0 part /etc/ssh
                                     /var/log/journal
                                     /mnt/persist
zram0       254:0    0    2G  0 disk [SWAP]
root@skiffos-b9cfbdb4:~# lsusb
Bus 006 Device 002: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 002: ID 152d:0578 JMicron Technology Corp. / JMicron USA Technology Corp. JMS578 SATA 6Gb/s
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 002: ID 0bda:c820 Realtek Semiconductor Corp. 802.11ac NIC
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
root@skiffos-b9cfbdb4:~# lsusb -t
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Mass Storage, Driver=uas, 5000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=exynos-ohci/3p, 12M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=exynos-ehci/3p, 480M
    |__ Port 1: Dev 2, If 0, Class=Wireless, Driver=btusb, 480M
    |__ Port 1: Dev 2, If 1, Class=Wireless, Driver=btusb, 480M
    |__ Port 1: Dev 2, If 2, Class=Vendor Specific Class, Driver=, 480M
root@skiffos-b9cfbdb4:~# reboot
---
root@skiffos-b9cfbdb4:~# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0 40.8M  0 loop /usr/lib/modules
sda           8:0    0    0B  0 disk 
mmcblk0     179:0    0 29.8G  0 disk 
|-mmcblk0p1 179:1    0  308M  0 part 
|-mmcblk0p2 179:2    0  290M  0 part /mnt/rootfs
`-mmcblk0p3 179:3    0 29.2G  0 part /etc/ssh
                                     /var/log/journal
                                     /mnt/persist
zram0       254:0    0    2G  0 disk [SWAP]
root@skiffos-b9cfbdb4:~# lsusb
Bus 006 Device 002: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 002: ID 152d:0578 JMicron Technology Corp. / JMicron USA Technology Corp. JMS578 SATA 6Gb/s
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 002: ID 0bda:c820 Realtek Semiconductor Corp. 802.11ac NIC
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
root@skiffos-b9cfbdb4:~# lsusb -t
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Mass Storage, Driver=uas, 5000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=exynos-ohci/3p, 12M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=exynos-ehci/3p, 480M
    |__ Port 1: Dev 2, If 0, Class=Wireless, Driver=btusb, 480M
    |__ Port 1: Dev 2, If 1, Class=Wireless, Driver=btusb, 480M
    |__ Port 1: Dev 2, If 2, Class=Vendor Specific Class, Driver=, 480M
paralin commented 1 year ago

@danyer Please test this out:

I updated the kernel on master & updated the kernel config to be in-line with what armbian has.

git checkout master
git pull
git submodule update
make br/linux-dirclean br/linux-headers-dirclean br/glibc-dirclean br/runc-dirclean br/host-go-dirclean br/docker-engine-dirclean br/docker-cli-dirclean br/containerd-dirclean
make compile

Hopefully this resolves the issue, if not, I'll keep looking.

paralin commented 1 year ago

@danyer according to odroid support:

The SATA controller JMS578 detected well on the USB interface after reboot.
Port 1: Dev 2, If 0, Class=Mass Storage, Driver=uas, 5000M
But the controller seemed not to be able to detect the storage device.

Let's see if the latest updates fixed it & ill reply there.

danyer commented 1 year ago

I've tested with master but same behavior :(

I've seen a message in my email account (but I'm not seeing here) to try using odroid-xu-mainline branch which was a mainline kernel. With that one (actually with odroid-xu-mainline-test since I was not able to see odroid-xu-mainline) I was not able to get network connectivity. I used an USB-UART connection, got the root prompt, but no password works. Since I cannot log via ssh keys (no network) and also not via serial port (no password), I cannot debug further.

paralin commented 1 year ago

@danyer yeah I edited my message here and changed the branch name since I realized that mainline didn't work properly from my tests.

As far as I can tell, armbian runs mainline 6.1.x in the edge config. So I'm not sure why it would run any differently. Maybe one of the patches on the skiffos branch breaks something.

paralin commented 1 year ago

@danyer I think the mainline kernel wasn't working due to me accidentally enabling compressed kernel modules. I fixed the test branch if you want to give it another go. I'll try it out today as well.

git checkout odroid-xu-mainline-test
git fetch
git reset --hard origin/odroid-xu-mainline-test
git submodule update
make br/linux-dirclean br/linux-headers-dirclean br/glibc-dirclean
rm -rf ./workspaces/default/extra_images
rm -rf ./workspaces/default/build/rtl*
make configure compile
danyer commented 1 year ago

Sorry, it took longer to test since I've just updated my machine to Ubuntu 23.04 (beta) and in doing so I guess I invalidated ccache so the build took longer. Unfortunately the issue still reproduces.


NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0 30.4M  0 loop /usr/lib/modules
sda           8:0    0    0B  0 disk 
mmcblk0     179:0    0 29.8G  0 disk 
|-mmcblk0p1 179:1    0  308M  0 part 
|-mmcblk0p2 179:2    0  290M  0 part /mnt/rootfs
`-mmcblk0p3 179:3    0 29.2G  0 part /etc/ssh
                                     /var/log/journal
                                     /mnt/persist
zram0       252:0    0    2G  0 disk [SWAP]
root@skiffos-ccc0a58a:~# uname -a
Linux skiffos-ccc0a58a 6.2.9 #1 SMP PREEMPT Fri Mar 31 22:35:56 CEST 2023 armv7l GNU/Linux
root@skiffos-ccc0a58a:~# neofetch
                                           root@skiffos-ccc0a58a 
             ,@@@@@@@@@@@w,_               --------------------- 
  ====~~~,,.A@@@@@@@@@@@@@@@@@W,_          OS: SkiffOS 2023.02-24-gccc0a58a armv7l 
  `||||||||||||||L{"@$@@@@@@@@B"           Host: Hardkernel Odroid XU4 
   `|||||||||||||||||||||L{"$D             Kernel: 6.2.9 
     @@@@@@@@@@@@@@@@@@@@@_||||}==,        Uptime: 1 min 
      *@@@@@@@@@@@@@@@@@@@@@@@@@p||||==,   Shell: bash 5.2.15 
        `'||LLL{{""@$B@@@@@@@@@@@@@@@p||   Terminal: /dev/pts/1 
            `~=|||||||||||L"$@@@@@@@@@@@   CPU: Samsung Exynos (Flattened Device Tree) (8) @ 1.400GHz 
                   ````'"""""""'""""""""   Memory: 107MiB / 1978MiB ```
paralin commented 1 year ago

@danyer I checked armbian and they are just using the regular mainline 6.1.x kernel. I don't think there's any reason why it would perform any differently on armbian vs. here. So that's quite confusing.

paralin commented 1 year ago

@danyer if you do partprobe /dev/sda what happens?

also, please check dmesg for any relevant logs

danyer commented 1 year ago
root@skiffos-ccc0a58a:~# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0 30.4M  0 loop /usr/lib/modules
sda           8:0    0    0B  0 disk 
mmcblk0     179:0    0 29.8G  0 disk 
|-mmcblk0p1 179:1    0  308M  0 part 
|-mmcblk0p2 179:2    0  290M  0 part /mnt/rootfs
`-mmcblk0p3 179:3    0 29.2G  0 part /etc/ssh
                                     /var/log/journal
                                     /mnt/persist
zram0       252:0    0    2G  0 disk [SWAP]
root@skiffos-ccc0a58a:~# fdisk /dev/sda

Welcome to fdisk (util-linux 2.38).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

fdisk: cannot open /dev/sda: No such file or directory

root@skiffos-ccc0a58a:~# journalctl -k -p warning

Mar 31 20:57:40 skiffos-ccc0a58a kernel: genirq: irq_chip COMBINER did not update eff. affinity mask of irq 57
Mar 31 20:57:40 skiffos-ccc0a58a kernel: gpio gpiochip0: Static allocation of GPIO base is deprecated, use dynamic allocation.
......
Mar 31 20:57:40 skiffos-ccc0a58a kernel: gpio gpiochip35: Static allocation of GPIO base is deprecated, use dynamic allocation.
Mar 31 20:57:40 skiffos-ccc0a58a kernel: ksmbd: The ksmbd server is experimental
Mar 31 20:57:40 skiffos-ccc0a58a kernel: samsung-usb2-phy 12130000.phy: supply vbus not found, using dummy regulator
Mar 31 20:57:40 skiffos-ccc0a58a kernel: exynos5_usb3drd_phy 12100000.phy: supply vbus not found, using dummy regulator
Mar 31 20:57:40 skiffos-ccc0a58a kernel: exynos5_usb3drd_phy 12100000.phy: supply vbus-boost not found, using dummy regulator
Mar 31 20:57:40 skiffos-ccc0a58a kernel: exynos5_usb3drd_phy 12500000.phy: supply vbus not found, using dummy regulator
Mar 31 20:57:40 skiffos-ccc0a58a kernel: exynos5_usb3drd_phy 12500000.phy: supply vbus-boost not found, using dummy regulator
Mar 31 20:57:40 skiffos-ccc0a58a kernel: dma-pl330 3880000.dma-controller: PM domain MAU will not be powered off
Mar 31 20:57:40 skiffos-ccc0a58a kernel: s2mps11-clk s2mps11-clk: DMA mask not set
Mar 31 20:57:42 skiffos-ccc0a58a kernel: exynos5-dmc 10c20000.memory-controller: error -ENXIO: IRQ drex_0 not found
Mar 31 20:57:42 skiffos-ccc0a58a kernel: exynos5-dmc 10c20000.memory-controller: error -ENXIO: IRQ drex_1 not found
Mar 31 20:57:42 skiffos-ccc0a58a kernel: phy phy-12130000.phy.6: phy_power_on was called before phy_init
Mar 31 20:57:42 skiffos-ccc0a58a kernel: phy phy-12130000.phy.6: phy_power_on was called before phy_init
Mar 31 20:57:43 skiffos-ccc0a58a kernel: exynos-adc 12d10000.adc: error -ENXIO: IRQ index 1 not found
Mar 31 20:57:43 skiffos-ccc0a58a kernel: OF: graph: no port node found in /soc/hdmi@14530000
Mar 31 20:57:43 skiffos-ccc0a58a kernel: samsung-i2s 3830000.i2s-sec: DMA channels sourced from device 3830000.i2s
Mar 31 20:57:43 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Test WP failed, assume Write Enabled
Mar 31 20:57:43 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Asking for cache data failed
Mar 31 20:57:43 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Assuming drive cache: write through
Mar 31 20:57:43 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes not a multiple of physical block size (0 bytes)
Mar 31 20:57:43 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (0 bytes)
Mar 31 20:58:13 skiffos-ccc0a58a kernel: kauditd_printk_skb: 112 callbacks suppressed

I realize how confusing it is. If you want I can give you ssh access to the odroid-hc1, maybe you can see more. It is like the USB-SATA bridge knows there is a disk there but cannot see it. Similar with a card reader, where /dev/sdf appears even if there is no card in. But only after inserting the card you can do something with /dev/sdf.

paralin commented 1 year ago

@danyer do you see any differences in the dmseg output between when it works and doesn't?

I noticed this:

kernel: sd 0:0:0:0: [sda] Test WP failed, assume Write Enabled
kernel: sd 0:0:0:0: [sda] Asking for cache data failed
kernel: sd 0:0:0:0: [sda] Assuming drive cache: write through
kernel: sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes not a multiple of physical block size (0 bytes)
kernel: sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (0 bytes)

Physical block size -> 0 bytes, that's not going to work.

danyer commented 1 year ago

This is when it's working:

Mar 31 21:23:07 skiffos-ccc0a58a kernel: scsi host0: uas
Mar 31 21:23:07 skiffos-ccc0a58a kernel: usbcore: registered new interface driver r8153_ecm
Mar 31 21:23:07 skiffos-ccc0a58a kernel: usbcore: registered new interface driver uas
Mar 31 21:23:07 skiffos-ccc0a58a kernel: scsi 0:0:0:0: Direct-Access     JMicron  Generic          3102 PQ: 0 ANSI: 6
Mar 31 21:23:07 skiffos-ccc0a58a kernel: usb 4-1: reset SuperSpeed USB device number 2 using xhci-hcd
Mar 31 21:23:07 skiffos-ccc0a58a kernel: r8152 4-1:1.0: load rtl8153a-3 v2 02/07/20 successfully
Mar 31 21:23:07 skiffos-ccc0a58a kernel: cfg80211: Loading compiled-in X.509 certificates for regulatory database
Mar 31 21:23:07 skiffos-ccc0a58a kernel: r8152 4-1:1.0 eth0: v1.12.13
Mar 31 21:23:07 skiffos-ccc0a58a kernel: cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
Mar 31 21:23:07 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] 125045424 512-byte logical blocks: (64.0 GB/59.6 GiB)
Mar 31 21:23:07 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] 4096-byte physical blocks
Mar 31 21:23:07 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Write Protect is off
Mar 31 21:23:07 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Mode Sense: 53 00 00 08
Mar 31 21:23:07 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Disabling FUA
Mar 31 21:23:07 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 31 21:23:07 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes
Mar 31 21:23:07 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of preferred minimum block size (4096 bytes)
Mar 31 21:23:07 skiffos-ccc0a58a kernel:  sda: sda1
Mar 31 21:23:07 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Attached SCSI disk

and this is when it's not

Mar 31 21:46:14 skiffos-ccc0a58a kernel: scsi host0: uas
Mar 31 21:46:14 skiffos-ccc0a58a kernel: usbcore: registered new interface driver uas
Mar 31 21:46:14 skiffos-ccc0a58a kernel: scsi 0:0:0:0: Direct-Access     JMicron  Generic          3102 PQ: 0 ANSI: 6
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Unit Not Ready
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Sense Key : 0x4 [current] 
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] ASC=0x44 <<vendor>>ASCQ=0x81 
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Sense Key : 0x4 [current] 
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] ASC=0x44 <<vendor>>ASCQ=0x81 
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Sense Key : 0x4 [current] 
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] ASC=0x44 <<vendor>>ASCQ=0x81 
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] 0 512-byte logical blocks: (0 B/0 B)
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] 0-byte physical blocks
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Test WP failed, assume Write Enabled
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Asking for cache data failed
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Assuming drive cache: write through
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes not a multiple of physical block size (0 bytes)
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (0 bytes)
Mar 31 21:46:14 skiffos-ccc0a58a kernel: usb 4-1: reset SuperSpeed USB device number 2 using xhci-hcd
Mar 31 21:46:14 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Attached SCSI disk
Mar 31 21:46:14 skiffos-ccc0a58a kernel: r8152 4-1:1.0: load rtl8153a-3 v2 02/07/20 successfully
Mar 31 21:46:14 skiffos-ccc0a58a kernel: r8152 4-1:1.0 eth0: v1.12.13

Unit not ready... It might be a timing issue because always after a soft reboot the drive disappears until the next hard reboot (poweroff/poweron), but while trying to capture the messages, it once managed to appear even after soft reboot. But this is like one in twenty, no more than that.

paralin commented 1 year ago

@danyer I added 1 more commit on that branch which disables the UAS protocol for the Jmicron sata drive, maybe it fixes the issue:

https://github.com/skiffos/SkiffOS/commit/ae74e057d81916cbed90237501ee3c899ad6a13b

git pull
make compile

... should be all that's needed to apply the change.

Let's see if that fixes it. Beyond that I don't have any ideas at the moment.

Alternatively you can try applying the quirk by removing and inserting the module:

rmmod usb-storage
modprobe usb-storage quirks=0x152d:0x0578:u
danyer commented 1 year ago

I've tried the new build and the disk is not recognized after soft reboot. This time the sda doesn't appear anymore, but I guess this is a feature of usb-storage (if there is no disk, don't display it) versus uas where if there is no disk, display it nevertheless, maybe someone with hotplug it...

Mar 31 22:56:15 skiffos-ccc0a58a kernel: usb 2-1: UAS is ignored for this device, using usb-storage instead
Mar 31 22:56:15 skiffos-ccc0a58a kernel: usb-storage 2-1:1.0: USB Mass Storage device detected
Mar 31 22:56:15 skiffos-ccc0a58a kernel: usb-storage 2-1:1.0: Quirks match for vid 152d pid 0578: 1800000
Mar 31 22:56:15 skiffos-ccc0a58a kernel: scsi host0: usb-storage 2-1:1.0
Mar 31 22:56:15 skiffos-ccc0a58a kernel: usbcore: registered new interface driver usb-storage
Mar 31 22:56:15 skiffos-ccc0a58a kernel: usbcore: registered new interface driver uas
Mar 31 22:56:15 skiffos-ccc0a58a kernel: usb 4-1: reset SuperSpeed USB device number 2 using xhci-hcd
Mar 31 22:56:15 skiffos-ccc0a58a kernel: r8152 4-1:1.0: load rtl8153a-3 v2 02/07/20 successfully
Mar 31 22:56:15 skiffos-ccc0a58a kernel: r8152 4-1:1.0 eth0: v1.12.13
Mar 31 22:56:15 skiffos-ccc0a58a kernel: zram: Added device: zram0
Mar 31 22:56:15 skiffos-ccc0a58a kernel: zram0: detected capacity change from 0 to 4194304
Mar 31 22:56:15 skiffos-ccc0a58a kernel: cfg80211: Loading compiled-in X.509 certificates for regulatory database
Mar 31 22:56:15 skiffos-ccc0a58a kernel: cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
Mar 31 22:56:15 skiffos-ccc0a58a kernel: Adding 2097148k swap on /dev/zram0.  Priority:-2 extents:1 across:2097148k SS
Mar 31 22:56:15 skiffos-ccc0a58a kernel: Adding 2097148k swap on /mnt/persist/primary.swap.  Priority:-3 extents:5 across:16637952k SS
Mar 31 22:56:16 skiffos-ccc0a58a kernel: scsi 0:0:0:0: Direct-Access     JMicron  Generic          3102 PQ: 0 ANSI: 6
Mar 31 22:56:16 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Unit Not Ready
Mar 31 22:56:16 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] Sense Key : 0x4 [current] 
Mar 31 22:56:16 skiffos-ccc0a58a kernel: sd 0:0:0:0: [sda] ASC=0x44 <<vendor>>ASCQ=0x81 

and

NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0 30.4M  0 loop /usr/lib/modules
mmcblk0     179:0    0 29.8G  0 disk 
|-mmcblk0p1 179:1    0  308M  0 part 
|-mmcblk0p2 179:2    0  290M  0 part /mnt/rootfs
`-mmcblk0p3 179:3    0 29.2G  0 part /etc/ssh
                                     /var/log/journal
                                     /mnt/persist
zram0       252:0    0    2G  0 disk [SWAP]

Thank you very much for your help, I don't want to waste your time anymore! Last thing that I'm going to try is to go to the previous commit (the one without the quirk) and to try

rmmod uas
modprobe uas

maybe it will work.

Thanks again, Dan.

paralin commented 1 year ago

@danyer OK I will drop the commit adding the quirk since it doesn't work.

Note that you will need to delete the file from your workspace as well as buildroot copies the root_overlay and doesn't know to remove the files:

rm ./workspaces/default/target/etc/modprobe.d/00-usb-quirks.conf
paralin commented 1 year ago

@danyer The odroid developers said they don't know why this is happening 😆 so I guess we are on our own here.

DavyLandman commented 1 year ago

I have a hc1 with a ssd running home-assistant OS. they're also using buildroot & stock kernel. Could there be something there? https://github.com/home-assistant/operating-system

paralin commented 1 year ago

@danyer @DavyLandman The main difference I see is they use the exynos defconfig, try this commit (latest master):

https://github.com/skiffos/SkiffOS/commit/f5b7c486ea6e76909504bc0815470b5e57247607

git checkout master
git fetch
git reset --hard origin/master
git submodule update

make configure
make cmd/linux-dirclean
make compile
paralin commented 1 year ago

The other thing is using Uboot 2023.x instead of 2017.x. Mainline u-boot supports it with odroid-xu3 config.

I also updated to use kernel 6.3.0 and dropped a lot of patches that probably were not needed.

Tested on my odroid HC2 + xu4, but I don't have an SSD to test the particular issue mentioned here

Latest version: 17f668508f3211fddb6f6d984c780711d0a70d95

DavyLandman commented 1 year ago

I'm still waiting on my next hc1/2 to arrive, so I cannot test this, but thanks for trying to get it aligned to their config. I hope @danyer has time to test it out?

danyer commented 1 year ago

I am on holiday right now so I cannot test...

On Thu, May 4, 2023, 14:07 Davy Landman @.***> wrote:

I'm still waiting on my next hc1/2 to arrive, so I cannot test this. I hope @danyer https://github.com/danyer has time to test it out?

— Reply to this email directly, view it on GitHub https://github.com/skiffos/SkiffOS/issues/278#issuecomment-1534573353, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJBOUB5B3E434446QWCQ6TXEOEXNANCNFSM6AAAAAAWGKWEQQ . You are receiving this because you were mentioned.Message ID: @.***>

paralin commented 1 year ago

I'm going to assume this is fixed as of release 2023.02.1 due to the update to U-boot 2023.02 and the updates to the 6.3.x kernel.

Everything is at the latest version now.

Feel free to message here if it's still an issue and I'll reopen!