Open Blub opened 1 year ago
Just tested with the updated rc4 upstream
branch - same issue.
Still happening with 6.4rc1. Again, this does not happen with 5.15, so I'm pretty sure it's a driver issue, and not a power supply issue.
I can confirm I've reproduced the same issue but this time on the 5.15 Debian kernel. Shouldn't be any power issues here but I am using a very inexpensive nvme device.
The same for me using the latest (as of 2023-07-16) SD image.
with: 0001:01:00.0 Non-Volatile memory controller: Intel Corporation SSD Pro 7600p/760p/E 6100p Series (rev 03)
some (random) excerpts from dmesg
pcie_plda 2b000000.pcie: Failed to get power-gpio, but maybe it's always on.
[ 3.245160] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[ 3.252518] pci 0000:00:00.0: BAR 0: no space for [mem size 0x100000000 64bit pref]
3.394655] pcie_plda 2c000000.pcie: Failed to get power-gpio, but maybe it's always on.
[ 3.884976] pci 0001:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0001:00:00.0 (capable of 31.504 Gb/
s with 8.0 GT/s PCIe x4 link)
[ 3.912991] pci 0001:00:00.0: BAR 0: no space for [mem size 0x100000000 64bit pref]
[ 3.925816] pci 0001:00:00.0: BAR 0: failed to assign [mem size 0x100000000 64bit pref]
[ 4.768043] nvme nvme0: 4/0/0 default/read/poll queues
[ 4.773220] starfive_raxda_10inch 2-0020: dsi command return -61, mode 0
Any way to help debug this? (developer, but did not touched kernel for few years ...)
Also the following appears in dmesg
output occasionally
[21078.151284] nvme nvme0: Abort status: 0x0
Related thread here https://forum.rvspace.org/t/nvme-i-o-timeouts/1545
I confirm running into that issue making it impossible to use reliably NVMe on the long run because they get disconnected under high load / high write, e.g. compiling and installing a fresh OS.
As a workaround I have set echo 3000 >/sys/devices/platform/soc/9c0000000.pcie/pci0001:00/0001:00:00.0/0001:01:00.0/nvme/nvme0/nvme0n1/queue/io_timeout
to lower the timeout to some reasonable value (the default is 30000). Not sure what is the lowest usable value, it help a bit, but not really a solution.
I've been testing the 6.3rc based
upstream
branch on a VisionFive2 (1.2a), with root on NVME. Disk access regularly stalls for a bit, then I see messages like this one:[ 101.417700] nvme nvme0: I/O 897 QID 2 timeout, completion polled
then things continue normally for a while. Same with different nvmes (tried an Intel Optane, and a WD RED). This does not happen with the 5.15 kernel included in the debian sdcard image.