Open teoteo opened 10 months ago
Is the drive you are using in this compatibility list? https://shop.pimoroni.com/products/nvme-base?variant=41219587178579
It isn’t listed: it’s a Samsung 970 EVO 1TB. It woks on Raspberry Pi OS:
2024/01/05 15:57:40
Copyright (c) 2012 Broadcom
version 30cc5f37 (release) (embedded)
Raspberry Pi reference 2023-12-05
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 70cd6f2a1e34d07f5cba7047aea5b92457372e05, stage4
Linux raspberrypi 6.1.0-rpi7-rpi-2712 #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux
but if it can help, I can try with a listed Crucial P3 4TB.
It isn’t listed: it’s a Samsung 970 EVO 1TB. It woks on Raspberry Pi OS:
On RPiOS can you run update to 6.6 kernel and confirm you see the same failure as you do on Arch?
On RPiOS can you run update to 6.6 kernel and confirm you see the same failure as you do on Arch?
Updated and it works:
matteo@raspberrypi:~ $ vcgencmd version; cat /etc/rpi-issue; uname -a
2024/01/15 19:02:28
Copyright (c) 2012 Broadcom
version f0aa0715 (release) (embedded)
Raspberry Pi reference 2023-12-05
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 70cd6f2a1e34d07f5cba7047aea5b92457372e05, stage4
Linux raspberrypi 6.6.12-v8-16k+ #1723 SMP PREEMPT Thu Jan 18 13:00:39 GMT 2024 aarch64 GNU/Linux
matteo@raspberrypi:~ $ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 93.3M 1 loop /snap/core/16204
sda 8:0 1 0B 0 disk
mmcblk0 179:0 0 58.9G 0 disk
├─mmcblk0p1 179:1 0 512M 0 part /boot/firmware
└─mmcblk0p2 179:2 0 58.4G 0 part /
nvme0n1 259:0 0 931.5G 0 disk
├─nvme0n1p1 259:1 0 256M 0 part
└─nvme0n1p2 259:2 0 117.5G 0 part
Do you have suggestions for Arch Linux ARM guys to better investigate the issue?
Could the issue be that other package is not providing something needed? ArchARM ships RPiOS bcm2712_defconfig as our baseline config. We add a few modules and settings but the basis set is there. I do not have NVMe hardware for testing on my RPi5/ArchARM and have no experience using these devices.
linux-rpi-16k
kernel package using the .config
generated by make bcm2712_defconfig
(for reference full .config) without any modifications which should be the same one as RPiOS is using. Download the kernel package from here.
- Does booting into that kernel allow the device to work?
Yes, with that kernel the device works:
~ » uname -a; lsblk astronaut@astroarch
Linux astroarch 6.6.12-2-rpi-16k #1 SMP PREEMPT Sun Jan 21 11:48:27 EST 2024 aarch64 GNU/Linux
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
mmcblk0 179:0 0 29.1G 0 disk
├─mmcblk0p1 179:1 0 256M 0 part /boot
└─mmcblk0p2 179:2 0 28.9G 0 part /
nvme0n1 259:0 0 931.5G 0 disk
├─nvme0n1p1 259:1 0 256M 0 part
└─nvme0n1p2 259:2 0 117.5G 0 part
@teoteo : So glad you've opened this issue... my NVMe hardware (also fm Pimoroni) is to be delivered this week!
Yes, with that kernel the device works:
OK, so something in the options we add to bcm2712_defconfig is causing the problem. The added options are here. At first glance, I did not see anything obviously to blame. @popcornmix @pelwell - do you have any thoughts?
The modified .config is generated like this:
make bcm2712_defconfig
cat archarm.diffconfig >> .config
make oldconfig
For reference, the following commit illustrates the fully expanded .config highlighting the difference between RPiOS's bcm2712_defconfig and the one that is apparently to blame for the reported issue: https://github.com/graysky2/PKGBUILDs/commit/4846daab5d21f5bfcd90bc54beea23f784a28cf7
Again, I am not seeing any smoking gun. Grateful for any thought you may have.
I've just hit this with rpi-update - compare pull/5882 with 6.1.74 -> bootloader probes NVMe, Linux fails to establish a link. On 6.1.74, Linux probes fine.
Reverting this hunk which changed between 6.1 and 6.6 fixes things for me.
@@ -393,42 +395,34 @@ static u32 brcm_pcie_mdio_form_pkt(int port, int regad, int cmd)
/* negative return value indicates error */
static int brcm_pcie_mdio_read(void __iomem *base, u8 port, u8 regad, u32 *val)
{
- int tries;
u32 data;
+ int err;
writel(brcm_pcie_mdio_form_pkt(port, regad, MDIO_CMD_READ),
base + PCIE_RC_DL_MDIO_ADDR);
readl(base + PCIE_RC_DL_MDIO_ADDR);
-
- data = readl(base + PCIE_RC_DL_MDIO_RD_DATA);
- for (tries = 0; !MDIO_RD_DONE(data) && tries < 10; tries++) {
- udelay(10);
- data = readl(base + PCIE_RC_DL_MDIO_RD_DATA);
- }
-
+ err = readl_poll_timeout_atomic(base + PCIE_RC_DL_MDIO_RD_DATA, data,
+ MDIO_RD_DONE(data), 10, 100);
*val = FIELD_GET(MDIO_DATA_MASK, data);
- return MDIO_RD_DONE(data) ? 0 : -EIO;
+
+ return err;
}
/* negative return value indicates error */
static int brcm_pcie_mdio_write(void __iomem *base, u8 port,
u8 regad, u16 wrdata)
{
- int tries;
u32 data;
+ int err;
writel(brcm_pcie_mdio_form_pkt(port, regad, MDIO_CMD_WRITE),
base + PCIE_RC_DL_MDIO_ADDR);
readl(base + PCIE_RC_DL_MDIO_ADDR);
writel(MDIO_DATA_DONE_MASK | wrdata, base + PCIE_RC_DL_MDIO_WR_DATA);
- data = readl(base + PCIE_RC_DL_MDIO_WR_DATA);
- for (tries = 0; !MDIO_WT_DONE(data) && tries < 10; tries++) {
- udelay(10);
- data = readl(base + PCIE_RC_DL_MDIO_WR_DATA);
- }
-
- return MDIO_WT_DONE(data) ? 0 : -EIO;
+ err = readw_poll_timeout_atomic(base + PCIE_RC_DL_MDIO_WR_DATA, data,
+ MDIO_WT_DONE(data), 10, 100);
+ return err;
}
So, MDIO prodding broke - but why does RP1 still work?
i.e. this commit: ca5dcc76314d1fa6d7307fd3b95039b08d2f2b97
Does the switch from -EIO
to -ETIMEDOUT
matter?
Or is it the change in timeout duration? (I feel the replacement will timeout a fraction earlier)
I think the bug is doing readw into a u32 then testing the top bit for "done", and exiting if it's 0. This means writes done in rapid succession will collide.
Yes, the readw looks like a typo/thinko (w = word). So you just want:
- err = readw_poll_timeout_atomic(base + PCIE_RC_DL_MDIO_WR_DATA, data,
+ err = readl_poll_timeout_atomic(base + PCIE_RC_DL_MDIO_WR_DATA, data,
MDIO_WT_DONE(data), 10, 100);
Yep that fixes it.
Just push your patch.
Include:
Fixes: ca5dcc76314d ("PCI: brcmstb: Replace status loops with read_poll_timeout_atomic()")
Did I misread something? @teoteo - if you booted with the unmodified 6.6.12-1 (built from https://github.com/raspberrypi/linux/commit/7df1d3b44368209517fc7dc0b35ee2621a94722e with ArchARM's config) the bug was triggered yet if you booted with the kernel I prepared using that same commit yet RPiOS's config not ArchARM's config, you did not see the bug. Is this correct?
That would have been on a 6.1 kernel, which hasn't received the offending patch.
But then:
Updated and it works:
matteo@raspberrypi:~ $ vcgencmd version; cat /etc/rpi-issue; uname -a 2024/01/15 19:02:28 Copyright (c) 2012 Broadcom version f0aa0715 (release) (embedded) Raspberry Pi reference 2023-12-05 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 70cd6f2a1e34d07f5cba7047aea5b92457372e05, stage4 Linux raspberrypi 6.6.12-v8-16k+ #1723 SMP PREEMPT Thu Jan 18 13:00:39 GMT 2024 aarch64 GNU/Linux
I can't explain that.
@pelwell - if you see here, @teoteo is using the 6.6.12 kernel built from that commit....
In any case, I am glad you guys got this fixed. I updated ArchARM's kernels/should build and hit a mirror soon.
I would like to thank anyone here. Really appreciate this has been resolved thanks to collective efforts!
I did some testing and it works on my system. Thank you all.
~ » uname -a; lsblk
Linux astroarch 6.6.13-2-rpi #1 SMP PREEMPT Tue Jan 23 13:37:27 MST 2024 aarch64 GNU/Linux
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
mmcblk0 179:0 0 116.2G 0 disk
├─mmcblk0p1 179:1 0 256M 0 part
└─mmcblk0p2 179:2 0 115.9G 0 part
nvme0n1 259:0 0 931.5G 0 disk
├─nvme0n1p1 259:1 0 256M 0 part /boot
└─nvme0n1p2 259:2 0 931.3G 0 part /
Is this fix something that can help with the other NVMe-not-working cases for the normal RPI Debian?
Describe the bug
Connecting an NVMe SSD using the Pimoroni "NVMe Base for Raspberry Pi 5" on Arch Linux ARM with kernel versions prior to 6.6.x not on versions 6.6.9 and higher.
Steps to reproduce the behaviour
Using Arch Linux (in my case AstroArch) with connected a PCIe extension board with NVMe SSD connected.
lsblk
I can see the SSDlsblk
I can’t see the SSDsudo pacman -U https://alaa.ad24.cz/packages/l/linux-rpi/linux-rpi-6.1.70-1-aarch64.pkg.tar.xz
and rebootlsblk
I can see the SSDDevice (s)
Raspberry Pi 5
System
OS: Arch Linux ARM aarch64 Host: Raspberry Pi 5 Model B Rev 1.0 vcgencmd not available Kernel: Linux astroarch 6.6.12-1-rpi #1 SMP PREEMPT Wed Jan 17 19:05:39 MST 2024 aarch64 GNU/Linux
Logs
No response
Additional context
I’ve tried with different kernels and with kernels after 6.6.x (linux-rpi-16k too) can’t see the SSD. Asking on Arch Linux ARM forum don’t seems to be a modules issue.