inindev / nanopc-t6

debian arm64 linux for the nanopc-t6
GNU General Public License v3.0
19 stars 3 forks source link

Crucial NVMe SSD not visible in lsblk/lspci #8

Closed Infinoid closed 3 months ago

Infinoid commented 5 months ago

As #7 says so eloquently, nvme not works.

I flashed nanopc-t6_bookworm-v12-6.7-rc7.img.xz to an sd card and booted it. It boots and looks great. However, I have an NVMe device plugged in, which doesn't show up in lspci or lsblk. It does show up with the original vendor firmware.

This gist has the dmesg and lspci output from booting this release.

This gist has the dmesg and lspci output from the vendor's kernel, where it does show up. I also included the lspci -vvs output for the NVMe device.

I think it's just the PCIe 3.0 M-key slot that doesn't work. I have also tried plugging a wifi card into the M.2 E-key PCIe 2.0 slot. That seems to work fine. (I haven't tried to connect to anything, but it shows up in lspci and ip link.)

I was also able to build a vanilla 6.8.8 kernel and boot that, it behaves similarly with the NVMe not showing up.

In case it matters, the NVMe device I'm testing with is a Crucial SSD that speaks PCIe 3.0, model number CT4000P3SSD8. The NanoPC T6 board I have is apparently the 2301 version, not the LTS version. Look here to tell the difference; the debug UART header is 3-pin on the non-LTS version, and the maskrom/reset buttons are in different locations.

Does the M.2 M-key slot work for you? If so, which NanoPC T6 board version do you have?

Infinoid commented 5 months ago

Your build of U-Boot also can't see it, so at least it's consistent. At the u-boot prompt, nvme scan, nvme info, nvme detail all show nothing.

I also tried some pci commands but I don't really know what I'm doing. pci enum showed me a couple of Link Fail messages and nothing else.

inindev commented 5 months ago

I am running the 3.2 NanoPC-T6 board and successfully booting from a Samsung 980 SSD NVMe https://www.amazon.com/gp/product/B08V7GT6F3

I stick to Samsung NVMe devices now as I have never had a compatibility issue with one.

A while back, I had a problem with a KingSpec M.2 SATA SSD and needed to increase the timeout to get it to be recognized. https://github.com/inindev/radxa-e25/blob/main/uboot/patches/0003-board-rockchip-Fix-SATA-SSD-Timeout.patch.optional

Not sure how handy you are with compiling the device tree, but an approach like this may be worth a shot. https://github.com/u-boot/u-boot/blob/master/arch/arm/dts/rk3588-nanopc-t6.dts

Infinoid commented 5 months ago

I am running the 3.2 NanoPC-T6 board

Ok, I'll take this to mean you have the non-LTS board, like me. 3.2 is just a markdown section number, not a board version... I guess the board version is 2301, since that's what's printed on it (beneath the ssd) and that's how other parts of their wiki page refer to it?

A while back, I had a problem with a KingSpec M.2 SATA SSD and needed to increase the timeout to get it to be recognized. https://github.com/inindev/radxa-e25/blob/main/uboot/patches/0003-board-rockchip-Fix-SATA-SSD-Timeout.patch.optional

Ok, thanks. You've given me a couple of things to try. I'll see if it works with other SSDs, and I'll try adding a delay to the device tree.

For a simple test (not planning to boot from nvme), is it okay to just add it to the linux dtb, and not the u-boot one? The linux one is easier to access/replace, since it's a file in /boot and a line in the extlinux config. I ask because I'm not sure whether linux does the PCI enumeration or if u-boot does it.

Infinoid commented 5 months ago

A while back, I had a problem with a KingSpec M.2 SATA SSD and needed to increase the timeout to get it to be recognized. https://github.com/inindev/radxa-e25/blob/main/uboot/patches/0003-board-rockchip-Fix-SATA-SSD-Timeout.patch.optional

Hmmm, looking more closely... that's a SATA SSD. So it's a PCIe SATA controller-- and a SATA SSD packed onto an M.2 board. The U-Boot logs in the patch description are talking about a SATA link timeout. So, the SATA controller on the PCIe bus was visible to u-boot even without the change, and the timeout happened on the SATA side.

[Edit: no, it's just using the SATA pins of the M.2 connector and taking power from the vcc3v3_pcie30 power regulator.]

My SSD is an NVMe, and I think the problem is at the PCIe bus level. So I think it's a different situation... but the timeout thing seems harmless to try, I guess.

You got this message, both times (with or without the patch):

pcie_dw_rockchip pcie@fe280000: failed to enable vpcie3v3 (ret=-114)

Was that from the same bus?

inindev commented 5 months ago

I am not sure this is the issue. The SATA device is a PCIe device just as NVMe is a PCIe device. It would be something I would try if I were debugging this issue.

see the schematic p.28: https://wiki.friendlyelec.com/wiki/images/9/97/NanoPC-T6_2301_SCH.PDF

the power enable for the m.2 nvme is: PCIE_M2_0_PWREN -> GPIO2_C5_d

the power supply for gpio2 c5 is vcc3v3_pcie30: https://github.com/u-boot/u-boot/blob/master/arch/arm/dts/rk3588-nanopc-t6.dts#L154

and vcc3v3_pcie30 controlls the pcie3x4 device: https://github.com/u-boot/u-boot/blob/master/arch/arm/dts/rk3588-nanopc-t6.dts#L455

inindev commented 5 months ago

also, it may be worth trying to rescan the pci bus while booted in linux:

sudo su
echo "1" > /sys/bus/pci/rescan
Infinoid commented 5 months ago

I tested a few things with the Samsung SSD you recommended, and some others. Here's what I found.

Bus Manuf. Product Vendor FW inindev FW +rescan +delay 6.8.8
SATA Intel SSDSCKKW010X6 :x: :x: :x: :x: :x:
SATA Toshiba THNSNJ128G8NU :x: :x: :x: :x: :x:
SATA Transcend TS512GMTS800 :x: :x: :x: :x: :x:
NVMe Crucial CT4000P3SSD8 Yes :x: :x: :x: :x:
NVMe Samsung MZ-V7E2T0 Yes Yes - - Yes
NVMe Samsung MZ-V8V500 Yes Yes - - Yes
NVMe TeamGroup TM8FP4004T0C101 Yes Yes - - Yes

Conclusions based on these results:

Infinoid commented 5 months ago

Note that I only patched the DTB that is passed to linux (via the fdt directive in extlinux.conf). I've been using your release image and haven't built u-boot from source, so u-boot's DTB is hard to access/modify.

inindev commented 4 months ago

I have updated the debian images to sid kernel 6.8.9: https://github.com/inindev/debian-image/releases/download/v12.5-rc3/nanopc-t6_bookworm-12.5.img.xz

I would be interested to see if this has improved the situation. If not, there is a 6.9.2 kernel worth trying: https://github.com/inindev/linux-rockchip/releases/download/v6.9.2/linux-image-6.9.2-1-arm64_6.9.2-1_arm64.deb

wget https://github.com/inindev/linux-rockchip/releases/download/v6.9.2/linux-image-6.9.2-1-arm64_6.9.2-1_arm64.deb
sudo dpkg -i linux-image-6.9.2-1-arm64_6.9.2-1_arm64.deb
Infinoid commented 4 months ago

I will give them a try, thanks.

Infinoid commented 4 months ago

I got a second nanopc-t6, same hardware revision (2301, with the 3-pin debug UART header), and used that for today's testing.

I stuck the new nanopc-t6_bookworm-12.5.img.xz image on an SD card, stuck it in, and it worked. It sees the Crucial NVMe SSD that was previously not working. It still doesn't like SATA M.2 SSDs, but that's a separate problem.

Bus Manuf. Product 6.8.9 6.9.2
SATA Intel SSDSCKKW010X6 :x: :x:
NVMe Crucial CT4000P3SSD8 Yes Yes
NVMe Samsung MZ-V7E2T0 Yes Yes
NVMe TeamGroup TM8FP4004T0C101 Yes Yes

So I'm a bit perplexed. The device shows up fine with both of the kernels you linked, and also with my self-built kernel that didn't see it before. Cold boot, warm boot, it doesn't matter, it works in every case.

Wondering if the first nanopc-6 board was somehow bad, I stuck this SD card and the Crucial SSD into that one, and it works there too.

So it's either a heisenbug, or else the kernel isn't important and the difference is in the U-Boot builds. I'll do some more A/B testing to try to figure that out.

inindev commented 4 months ago

That is great news; thanks for testing.

I also pushed a new kernel today for the nanopc-t6 with working HDMI and USB3 support, based on the latest upstream kernel: https://github.com/inindev/linux-rockchip/releases/tag/v6.9.3

I am working on getting the USB-C port working next.

Infinoid commented 4 months ago

I've set things up to have /boot on the SD card, and / on the NVMe SSD. (Having /boot on the SD card is nice, because if I need to move a bad kernel out of the way or fix a bad boot param, popping it out is easier than plugging in a serial cable.)

I've set up both SD cards to do this, and two NVMe SSDs with the same root filesystem image (a Samsung and this Crucial SSD). So now, my /boot and / images are the same, the only real difference is u-boot. With this setup, I see that the older one still can't see the NVMe card, and the newer one seems to be working okay so far.

So I'm pretty sure it's a difference in u-boot. Were there any specific changes in u-boot between nanopc-t6_bookworm-v12-6.7-rc7.img.xz and nanopc-t6_bookworm-12.5.img.xz related to nvme? Or some changes to pci init timing or something?

If it works, it works, and I should be happy, but I'm still curious about what happened, and what's different about this particular SSD.

inindev commented 3 months ago

The latest kernel has the usb-c port and sound working on the nanopc-t6. Everything should now work the best it can at the moment (as far as I know).

debian bookworm v12.5 arm64 - rc4 https://github.com/inindev/debian-image/releases

linux kernel v6.9.3-2 arm64 inindev https://github.com/inindev/linux-rockchip/releases

u-boot arm64 inindev v2024.07-rc4 https://github.com/inindev/uboot-rockchip/releases

Infinoid commented 3 months ago

It's been working completely reliably for me for the past couple of weeks. I still don't know what the root cause was, but I think this can be closed.

Thanks!