Closed daiaji closed 2 years ago
Does the primary controller require SR-IOV be enabled prior to onlining it's virtual controllers?
@keithbusch You are right. After setting the number of VFs, the auxiliary controller can be online, but it seems that no matter which VF is connected to, the block device cannot be found in the VM.
I created and deleted the default namespace, and created two new namespaces and attached them to the 0x41 controller.๏ผThe Samsung pm1733 marks the 0x0041 controller as the active master controller.๏ผ
nvme create-ns /dev/nvme0 -s 5358197520 -c 5358197520 -f 0 -d 0 -m 0
nvme create-ns /dev/nvme0 -s 2143279008 -c 2143279008 -f 0 -d 0 -m 0
nvme attach-ns /dev/nvme0 -n 1 -c 0x41
nvme attach-ns /dev/nvme0 -n 2 -c 0x41
nvme reset /dev/nvme0
After setting up the VQ VI for the 0x0002 controller, I passed the 01:00.2 device PCI to the VM, but I did not find the block device in the list output by the lsblk command of the Guest OS. Then I tried to set the VQ VI for the 0x0001 controller and pass through the 01:00.2 device to the VM. The block device was not found in the list output by the lsblk command of the Guest OS.
echo 4 > /sys/class/nvme/nvme0/device/sriov_numvfs
nvme virt-mgmt /dev/nvme0n2 -c 0x0002 -r0 -n2 -a8
nvme virt-mgmt /dev/nvme0n2 -c 0x0002 -r1 -n2 -a8
nvme virt-mgmt /dev/nvme0n2 -c 0x0002 -a9
nvme list-secondary /dev/nvme0
SCID : Secondary Controller Identifier : 0x0002
PCID : Primary Controller Identifier : 0x0041
SCS : Secondary Controller State : 0x0001 (Online)
VFN : Virtual Function Number : 0x0002
NVQ : Num VQ Flex Resources Assigned : 0x0002
NVI : Num VI Flex Resources Assigned : 0x0002
SCEntry[2 ]:
lspci | grep PM173X
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
01:00.2 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
01:00.3 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
01:00.4 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
01:00.5 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
Did I do something wrong? Each SSD that supports SR-IOV has a different method of enabling SR-IOV? Do I need to use proprietary software outside the nvme specification?
your platform probably did not provide enough PCI busses through the root port. Can you see the PCI functions in lspci
? If not, you will need to ask the kernel to re-enumerate the PCI bus, but you can set those parameters at boot time. It should be kernel paramters "pci=realloc,assign-busses,nocrs
", if i recall correctly. In some cases, though, the kernel may not be able to successfully renumber the bus, so even that might fail.
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd Device a801
Flags: bus master, fast devsel, latency 0, IRQ 43, NUMA node 0, IOMMU group 15
Memory at fcd10000 (64-bit, non-prefetchable) [size=32K]
Expansion ROM at fcc00000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable+ Count=64 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Device Serial Number 93-08-50-11-94-38-25-00
Capabilities: [168] Alternative Routing-ID Interpretation (ARI)
Capabilities: [178] Secondary PCI Express
Capabilities: [198] Physical Layer 16.0 GT/s <?>
Capabilities: [1c0] Lane Margining at the Receiver <?>
Capabilities: [1e8] Single Root I/O Virtualization (SR-IOV)
Capabilities: [3a4] Data Link Feature <?>
Kernel driver in use: nvme
01:00.2 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd Device a801
Flags: fast devsel, NUMA node 0, IOMMU group 34
Memory at fcc10000 (64-bit, non-prefetchable) [virtual] [size=32K]
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable- Count=580 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
01:00.3 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd Device a801
Flags: fast devsel, NUMA node 0, IOMMU group 35
Memory at fcc18000 (64-bit, non-prefetchable) [virtual] [size=32K]
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable- Count=580 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
It seems there is.
Oh, gotcha. I had a platform that enumerates the VFs as a different bus number from the PF, so I didn't put together that those BDf's were your secondary controllers.
In your commands, it says you are attaching two namespaces to CNTID x41, but you are assigning CNTID x1 to the guest. Is that correct? If so, it doesn't sound like the guest would be able to see those namespaces.
Does it mean the 0x0001 auxiliary controller output by list-secondary? Then I think so. Sorry, I don't particularly know how to map namespaces to VF.
Yeah, I think you wanted to do something like nvme attach-ns /dev/nvme0 -n 1 -c 0x1
instead. If you can do multi-controller namespaces, you can attach to multiple controllers at the same time like nvme attach-ns /dev/nvme0 -n 1 -c 1,2,3,4
nvme attach-ns /dev/nvme0 -n 2 -c 1,2,3,4
NVMe status: NS_IS_PRIVATE: The namespace is private and is already attached to one controller(0x2119
It does not seem possible to attach to multiple controllers.
If you want to do multiple controllers, and if the controller supports it (check primary's id-ctrl
cmic value), then you should be able to enable that with the --nmic=1
option on the create-ns
command.
When I attach namespace 2 to main controller 1, the output of the id-ns command seems a little strange.
nvme id-ns /dev/nvme0 -n2
NVME Identify Namespace 2:
nsze : 0
ncap : 0
nuse : 0
nsfeat : 0
nlbaf : 0
flbas : 0
mc : 0
dpc : 0
dps : 0
nmic : 0
rescap : 0
fpi : 0
dlfeat : 0
nawun : 0
nawupf : 0
nacwu : 0
nabsn : 0
nabo : 0
nabspf : 0
noiob : 0
nvmcap : 0
mssrl : 0
mcl : 0
msrc : 0
anagrpid: 0
nsattr : 0
nvmsetid: 0
endgid : 0
nguid : 00000000000000000000000000000000
eui64 : 0000000000000000
lbaf 0 : ms:0 lbads:0 rp:0 (in use)
This namespace doesn't seem to be working?
Try adding the -f
parameter to id-ns
.
It seems that there are 66 main controllers. Do I have to attach the namespace to these controllers one by one and pass through to the VM for testing? Is there no command to report the mapping relationship between namespace and controller and VF?๐ญ
I'm a little confused by your terminology. The spec uses "primary" and "secondary" controller terms. What is a "main" controller?
If you want to see which controller ID's of a subsystem are attached to a particular namespace ID, you can run, for example, nvme list-ctrl -n 1
for namespace ID 1.
Is the mapping relationship between the controller and the VF only known to the equipment manufacturer?
Correct, there is no spec guidance on how controller ID's are assigned to any particular controller within a NVM subsystem, which includes VFs.
I attached the namespace to all the controllers one by one and passed them through to the VM, but I didn't seem to find any block devices in the VM.
can you see the controllers and block device if you let the VF's bind to the host driver instead of a guest instance?
It seems that only when I attach the namespace to the 0x41 controller, can I find the block device in the host, and if I attach the namespace to other controllers, I cannot find the block device in the host. This seems very strange. When I use NIC's SR-IOV, it seems that the host can also use VF devices.
While the VF is bound to the host driver, could you run nvme list -v
?
lspci
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
01:00.2 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
01:00.3 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
01:00.4 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
01:00.5 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
nvme list -v
NVM Express Subsystems
Subsystem Subsystem-NQN Controllers
---------------- ------------------------------------------------------------------------------------------------ ----------------
nvme-subsys0 nqn.1994-11.com.samsung:nvme:PM1733:2.5-inch:S4YPNG0R400619 nvme0
nvme-subsys1 nqn.2014.08.org.nvmexpress:80868086PHM274900219280AGN INTEL SSDPE21D280GA nvme1
NVM Express Controllers
Device SN MN FR TxPort Address Subsystem Namespaces
-------- -------------------- ---------------------------------------- -------- ------ -------------- ------------ ----------------
nvme0 S4YPNG0R400619 SAMSUNG MZWLJ3T8HBLS-00007 EPK9AB5Q pcie 0000:01:00.0 nvme-subsys0 nvme0c0n1, nvme0c0n10, nvme0c0n11, nvme0c0n12, nvme0c0n13, nvme0c0n14, nvme0c0n15, nvme0c0n16, nvme0c0n17, nvme0c0n18, nvme0c0n19, nvme0c0n2, nvme0c0n20, nvme0c0n21, nvme0c0n22, nvme0c0n23, nvme0c0n24, nvme0c0n25, nvme0c0n26, nvme0c0n27, nvme0c0n28, nvme0c0n29, nvme0c0n3, nvme0c0n30, nvme0c0n31, nvme0c0n32, nvme0c0n4, nvme0c0n5, nvme0c0n6, nvme0c0n7, nvme0c0n8, nvme0c0n9
nvme1 PHM274900219280AGN INTEL SSDPE21D280GA E2010325 pcie 0000:23:00.0 nvme-subsys1 nvme1n1
NVM Express Namespaces
Device NSID Usage Format Controllers
------------ -------- -------------------------- ---------------- ----------------
nvme0n1 1 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n10 10 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n11 11 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n12 12 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n13 13 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n14 14 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n15 15 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n16 16 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n17 17 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n18 18 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n19 19 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n2 2 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n20 20 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n21 21 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n22 22 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n23 23 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n24 24 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n25 25 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n26 26 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n27 27 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n28 28 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n29 29 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n3 3 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n30 30 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n31 31 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n32 32 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n4 4 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n5 5 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n6 6 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n7 7 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n8 8 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme0n9 9 54.87 GB / 54.87 GB 512 B + 0 B nvme0
nvme1n1 1 280.07 GB / 280.07 GB 512 B + 0 B nvme1
It doesn't look like any VF controllers are bound to the host driver in this output. Are there any nvme errors indicated in the 'dmesg' for those functions?
[ 59.170930] pci 0000:01:00.2: [144d:a824] type 00 class 0x010802
[ 59.171148] pci 0000:01:00.2: Adding to iommu group 34
[ 59.171290] nvme nvme2: pci function 0000:01:00.2
[ 59.171318] nvme 0000:01:00.2: enabling device (0000 -> 0002)
[ 59.171324] pci 0000:01:00.3: [144d:a824] type 00 class 0x010802
[ 59.171615] pci 0000:01:00.3: Adding to iommu group 35
[ 59.171690] nvme nvme3: pci function 0000:01:00.3
[ 59.171715] pci 0000:01:00.4: [144d:a824] type 00 class 0x010802
[ 59.171740] nvme 0000:01:00.3: enabling device (0000 -> 0002)
[ 59.171922] pci 0000:01:00.4: Adding to iommu group 36
[ 59.171986] nvme nvme4: pci function 0000:01:00.4
[ 59.172009] pci 0000:01:00.5: [144d:a824] type 00 class 0x010802
[ 59.172034] nvme 0000:01:00.4: enabling device (0000 -> 0002)
[ 59.172237] pci 0000:01:00.5: Adding to iommu group 37
[ 59.172301] nvme nvme5: pci function 0000:01:00.5
[ 59.172315] nvme 0000:01:00.5: enabling device (0000 -> 0002)
[ 89.674264] nvme nvme5: Device not ready; aborting initialisation, CSTS=0x2
[ 89.674270] nvme nvme5: Removing after probe failure status: -19
[ 89.675266] nvme nvme2: Device not ready; aborting initialisation, CSTS=0x2
[ 89.675272] nvme nvme2: Removing after probe failure status: -19
[ 89.676219] nvme nvme4: Device not ready; aborting initialisation, CSTS=0x2
[ 89.676219] nvme nvme3: Device not ready; aborting initialisation, CSTS=0x2
[ 89.676223] nvme nvme3: Removing after probe failure status: -19
[ 89.676223] nvme nvme4: Removing after probe failure status: -19
Okay, looks broken. I think you have to take this to the vendor at this point.
https://stackoverflow.com/questions/65350988/how-to-setup-sr-iov-with-samsung-pm1733-1735-nvme-ssd It doesn't seem to be an isolated case, it may be the wrong firmware.
I understand this has been 'resolved'. Closing the issue.
It's actually really painful. Even now, it's unknown whether the SSD's firmware or driver is faulty. Since this is a second-hand SSD I bought, I can't find the corresponding customer support. But thank you for your answer, without your answer I might waste more time.๐
@daiaji FWIW I did get a PM1735 to work with SR-IOV. I found that you need to enable all 32 VFs (basically, cat sriov_totalvfs > sriov_numvfs
). If you enable fewer, then the nvme virt-mgmt ... -a 9
command always fails to bring the secondary controller online. In the process of futzing around, I also got the controller into an unhappy state that was only resolved after a whole-system reboot, so maybe try that too if you haven't already.
Thanks for your reply, I will try it.๐
@0xabu
nvme create-ns /dev/nvme0 -s 5358197520 -c 5358197520 -f 0 -d 0 -m 0
nvme create-ns /dev/nvme0 -s 2143279008 -c 2143279008 -f 0 -d 0 -m 0
nvme attach-ns /dev/nvme0 -n 1 -c 0x41
nvme attach-ns /dev/nvme0 -n 2 -c 0x41
nvme reset /dev/nvme0
cat /sys/class/nvme/nvme0/device/sriov_totalvfs > /sys/class/nvme/nvme0/device/sriov_numvfs
nvme virt-mgmt /dev/nvme0n2 -c 0x0001 -r0 -n2 -a8
nvme virt-mgmt /dev/nvme0n2 -c 0x0001 -r1 -n2 -a8
nvme virt-mgmt /dev/nvme0n2 -c 0x0001 -a9
Then I directly connected all PCI devices from 01:00.2 to 01:00.7 in QEMU, but I didn't find a block device in guest.
All, thanks for the updates, I am fighting with a similar problem.
My nvme is SAMSUNG MZWLJ7T6HALA-00007
, host Ubuntu 18.04 (5.4.0) and nvme version 2.0
(compiled from source), guest Ubuntu 20.04
I tried to follow [1]
https://lore.kernel.org/all/20211027164930.GC3331@lmaniak-dev.igk.intel.com/
so had some extra steps for a primary controller x041
nvme virt-mgmt /dev/nvme0 -c 0x41 -r 1 -a 1 -n 0
nvme virt-mgmt /dev/nvme0 -c 0x41 -r 0 -a 1 -n 0
nvme reset /dev/nvme0
echo 1 > /sys/bus/pci/rescan
then following @0xabu suggestion (previously I had 4)
echo 32 > /sys/class/nvme/nvme0/device/sriov_numvfs
which results in lspci
reporting new devices
1a:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824
1a:00.1 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824
...
1a:04.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824
note, following [1], I am not trying to assign VQ/VI to namespace but to the nvmeX
device and this step works
nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1
nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2
nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0
nvme list-secondary /dev/nvme0 | head
Identify Secondary Controller List:
NUMID : Number of Identifiers : 32
SCEntry[0 ]:
................
SCID : Secondary Controller Identifier : 0x0001
PCID : Primary Controller Identifier : 0x0041
SCS : Secondary Controller State : 0x0001 (Online)
VFN : Virtual Function Number : 0x0001
NVQ : Num VQ Flex Resources Assigned : 0x0002
NVI : Num VI Flex Resources Assigned : 0x0001
so far so good, then I add PCI device (1a:00.1) to VM and on the guest I see
lspci
00:08.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
but no /dev/nvmeX
in lsblk
on a guest and
[ 687.514627] pci 0000:00:08.0: [144d:a824] type 00 class 0x010802
[ 687.515493] pci 0000:00:08.0: reg 0x10: [mem 0x00000000-0x00007fff 64bit]
[ 687.516197] pci 0000:00:08.0: enabling Extended Tags
[ 687.517180] pci 0000:00:08.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown speed x0 link at 0000:00:08.0 (capable of 63.012 Gb/s with 16 GT/s x4 link)
[ 687.519176] pci 0000:00:08.0: BAR 0: assigned [mem 0x440000000-0x440007fff 64bit]
[ 687.716227] nvme nvme0: pci function 0000:00:08.0
[ 687.716340] nvme 0000:00:08.0: enabling device (0000 -> 0002)
[ 718.290794] nvme nvme0: Device not ready; aborting initialisation
[ 718.294800] nvme nvme0: Removing after probe failure status: -19
in a meantime on the host
May 13 16:47:48 pc-comp07 kernel: [ 5262.894466] vfio-pci 0000:1a:00.1: enabling device (0000 -> 0002)
May 13 16:48:13 pc-comp07 kernel: [ 5287.751826] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
May 13 16:48:13 pc-comp07 kernel: [ 5288.036439] nvme nvme0: Shutdown timeout set to 10 seconds
May 13 16:48:13 pc-comp07 kernel: [ 5288.048857] nvme nvme0: 63/0/0 default/read/poll queues
which is confirmed by secondary controller going offline
Identify Secondary Controller List:
NUMID : Number of Identifiers : 32
SCEntry[0 ]:
................
SCID : Secondary Controller Identifier : 0x0001
PCID : Primary Controller Identifier : 0x0041
SCS : Secondary Controller State : 0x0000 (Offline)
VFN : Virtual Function Number : 0x0001
NVQ : Num VQ Flex Resources Assigned : 0x0000
NVI : Num VI Flex Resources Assigned : 0x0000
For the record, on the host
1a:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd Device [144d:a824] (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd Device [144d:a801]
Physical Slot: 0-6
Flags: bus master, fast devsel, latency 0, IRQ 41, NUMA node 0
Memory at aae10000 (64-bit, non-prefetchable) [size=32K]
Expansion ROM at aad00000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable+ Count=64 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Device Serial Number 58-72-01-01-96-38-25-00
Capabilities: [168] Alternative Routing-ID Interpretation (ARI)
Capabilities: [178] #19
Capabilities: [198] #26
Capabilities: [1c0] #27
Capabilities: [1e8] Single Root I/O Virtualization (SR-IOV)
Capabilities: [3a4] #25
Kernel driver in use: nvme
Kernel modules: nvme
1a:00.1 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd Device [144d:a824] (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd Device [144d:a801]
Physical Slot: 0-6
Flags: fast devsel, NUMA node 0
[virtual] Memory at aad10000 (64-bit, non-prefetchable) [size=32K]
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable- Count=580 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
Kernel driver in use: vfio-pci
Kernel modules: nvme
I have a different version of OS/kernel/qemu than in [1], this week I hope to move NVMe to a system where I can install Ubuntu 20.04/22.04
Regards
nvme create-ns /dev/nvme0 -s 5358197520 -c 5358197520 -f 0 -d 0 -m 0 nvme create-ns /dev/nvme0 -s 2143279008 -c 2143279008 -f 0 -d 0 -m 0 nvme attach-ns /dev/nvme0 -n 1 -c 0x41 nvme attach-ns /dev/nvme0 -n 2 -c 0x41
@daiaji here you attached the new namespaces to the primary controller (0x41)
nvme virt-mgmt /dev/nvme0n2 -c 0x0001 -r0 -n2 -a8 nvme virt-mgmt /dev/nvme0n2 -c 0x0001 -r1 -n2 -a8 nvme virt-mgmt /dev/nvme0n2 -c 0x0001 -a9
... and here you enabled secondary controller (virtual function) #1. You need to detach the namespaces from controller 0x41 and attach them to controller 1.
lspci 00:08.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
but no
/dev/nvmeX
inlsblk
on a guest and[ 687.514627] pci 0000:00:08.0: [144d:a824] type 00 class 0x010802 [ 687.515493] pci 0000:00:08.0: reg 0x10: [mem 0x00000000-0x00007fff 64bit] [ 687.516197] pci 0000:00:08.0: enabling Extended Tags [ 687.517180] pci 0000:00:08.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown speed x0 link at 0000:00:08.0 (capable of 63.012 Gb/s with 16 GT/s x4 link) [ 687.519176] pci 0000:00:08.0: BAR 0: assigned [mem 0x440000000-0x440007fff 64bit] [ 687.716227] nvme nvme0: pci function 0000:00:08.0 [ 687.716340] nvme 0000:00:08.0: enabling device (0000 -> 0002) [ 718.290794] nvme nvme0: Device not ready; aborting initialisation [ 718.294800] nvme nvme0: Removing after probe failure status: -19
My first thought was the primary controller must be in some bad state.
in a meantime on the host
May 13 16:47:48 pc-comp07 kernel: [ 5262.894466] vfio-pci 0000:1a:00.1: enabling device (0000 -> 0002) May 13 16:48:13 pc-comp07 kernel: [ 5287.751826] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 May 13 16:48:13 pc-comp07 kernel: [ 5288.036439] nvme nvme0: Shutdown timeout set to 10 seconds May 13 16:48:13 pc-comp07 kernel: [ 5288.048857] nvme nvme0: 63/0/0 default/read/poll queues
And that appears to confirm it! If your controller reports CSTS.CFS as 1 (0x3 in your above output), a reset is the required operation to proceed. Spec says "If the primary controller associated with a secondary controller is disabled or undergoes a Controller Level Reset, then the secondary controller shall implicitly transition to the Offline state."
So, it looks like you'd need to manually re-online each secondary controller. That seems a bit fragile if the guest requires host assistance when it hasn't done anything wrong...
@0xabu
nvme virt-mgmt /dev/nvme0 -c 0x41 -r 1 -a 1 -n 0
nvme virt-mgmt /dev/nvme0 -c 0x41 -r 0 -a 1 -n 0
nvme reset /dev/nvme0
echo 1 > /sys/bus/pci/rescan
echo 32 > /sys/class/nvme/nvme0/device/sriov_numvfs
nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1
nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2
nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0
nvme list-secondary /dev/nvme0 | head
Identify Secondary Controller List:
NUMID : Number of Identifiers : 32
SCEntry[0 ]:
................
SCID : Secondary Controller Identifier : 0x0001
PCID : Primary Controller Identifier : 0x0041
SCS : Secondary Controller State : 0x0001 (Online)
VFN : Virtual Function Number : 0x0001
NVQ : Num VQ Flex Resources Assigned : 0x0002
NVI : Num VI Flex Resources Assigned : 0x0001
SCEntry[1 ]:
nvme attach-ns /dev/nvme0 -n 2 -c 0x0001
nvme list -v
Subsystem Subsystem-NQN Controllers
---------------- ------------------------------------------------------------------------------------------------ ----------------
nvme-subsys2 nqn.2014.08.org.nvmexpress:144d144dS4GCNE0R404170 SAMSUNG MZVLB2T0HALB-000L7 nvme2
nvme-subsys1 nqn.2014.08.org.nvmexpress:1e491cc1ZTA22T0KA220440DW3 ZHITAI TiPlus5000 2TB nvme1
nvme-subsys0 nqn.1994-11.com.samsung:nvme:PM1733:2.5-inch:S4YPNG0R400619 nvme0
Device SN MN FR TxPort Address Subsystem Namespaces
-------- -------------------- ---------------------------------------- -------- ------ -------------- ------------ ----------------
nvme2 S4GCNE0R404170 SAMSUNG MZVLB2T0HALB-000L7 4M2QEXG7 pcie 0000:23:00.0 nvme-subsys2 nvme2n1
nvme1 ZTA22T0KA220440DW3 ZHITAI TiPlus5000 2TB ZTA08322 pcie 0000:22:00.0 nvme-subsys1 nvme1n1
nvme0 S4YPNG0R400619 SAMSUNG MZWLJ3T8HBLS-00007 EPK9AB5Q pcie 0000:01:00.0 nvme-subsys0 nvme0n1
Device Generic NSID Usage Format Controllers
------------ ------------ -------- -------------------------- ---------------- ----------------
/dev/nvme2n1 /dev/ng2n1 1 736.85 GB / 2.05 TB 512 B + 0 B nvme2
/dev/nvme1n1 /dev/ng1n1 1 2.05 TB / 2.05 TB 512 B + 0 B nvme1
/dev/nvme0n1 /dev/ng0n1 1 2.74 TB / 2.74 TB 512 B + 0 B nvme0
lspci.log https://gist.github.com/daiaji/3cb114264a536b9aeb8ccf91c4ada887
After attaching namespace 2 to controller 1, I don't see /dev/nvme0n2 in the host. ๐ญ
After attaching namespace 2 to controller 1, I don't see /dev/nvme0n2 in the host.
I think that's expected. You can't attach the namespace to both host and guest controllers at the same time. I also noticed that 'nvme id-ns' shows all zeros unless the namespace is attached to the primary, but I can access it just fine in the guest.
@0xabu @piotrekz79 @keithbusch I also noticed that my lspci output doesn't seem to have 10:00.1. It seems that 10:00.1 is skipped. I actually passed the rest of the VF through to the guest, but it doesn't seem to work.
sudo nvme id-ctrl /dev/nvme0 | grep fr
fr : EPK9AB5Q
frmw : 0x17
Is my device firmware out of date? dmesg.log
@daiaji I have fr EPK9CB5Q, but that's just what came on the card. I don't know of a public source for firmware updates.
@daiaji I have fr EPK9CB5Q, but that's just what came on the card. I don't know of a public source for firmware updates.
It looks like it's just because the PM1733 and PM1735 use different firmware, I checked on some web pages and it seems that I'm already using the new firmware. So, what motherboard and CPU do you use? I guess it shouldn't have much to do with the Linux kernel version.
All, an update, sadly without any real success
firmware is the same as @daiaji report
root@power01:/home/ubuntu# sudo nvme id-ctrl /dev/$NVME | grep fr
fr : EPK98B5Q
frmw : 0x17
I installed the drive to supermicro AS-1114CS-TNR motherboard H12SSW-AN6 because I was able to install Ubuntu 22.04 there. That - comapred to intel system I tested previously - led to problems with IOMMU. Namely, after creating VFs, all of them landed in IOMMU group 0 (despite other devices having proper separation). I ended up with installing a kernel which has acs patch installed
uname -a
Linux power01 5.17.0-9.1-liquorix-amd64 #1 ZEN SMP PREEMPT liquorix 5.17-13ubuntu1~jammy (2022-05-18) x86_64 x86_64 x86_64 GNU/Linux
root@power01:/home/ubuntu# cat /proc/cmdline
audit=0 intel_pstate=disable hpet=disable BOOT_IMAGE=/boot/vmlinuz-5.17.0-9.1-liquorix-amd64 root=UUID=72aa7786-199c-476e-a5fd-3cf6149cac62 ro amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction,id:144d:a824
As a result, I got proper separation and possibility to pass a device to a VM (see below for IOMMU group)
I repeated previous steps, running VM guest on Ubuntu 22.04 also, trying both ( as @keithbusch suggested )
NVME=nvme3
nvme virt-mgmt /dev/$NVME -c 1 -r 1 -a 8 -n 1
nvme list-secondary /dev/$NVME | head
nvme virt-mgmt /dev/$NVME -c 1 -r 0 -a 8 -n 2
nvme list-secondary /dev/$NVME | head
nvme virt-mgmt /dev/$NVME -c 1 -r 0 -a 9 -n 0
nvme list-secondary /dev/$NVME | head
In both cases the result was the same: I can a device on a guest in lspci
but not in lsblk
and nvme nvme0: Device not ready; aborting initialisation, CSTS=0x2
error on guest
host
May 20 11:12:12 power01 kernel: vfio-pci 0000:c4:00.1: enabling device (0000 -> 0002)
c4:00.1 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM173X [144d:a824] (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller PM173X [144d:a801]
Physical Slot: 7
Flags: fast devsel, NUMA node 0, IOMMU group 97
Memory at b7210000 (64-bit, non-prefetchable) [virtual] [size=32K]
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable- Count=580 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
Kernel driver in use: vfio-pci
guest
[Fri May 20 11:12:12 2022] pci 0000:07:00.0: [144d:a824] type 00 class 0x010802
[Fri May 20 11:12:12 2022] pci 0000:07:00.0: reg 0x10: [mem 0x00000000-0x00007fff 64bit]
[Fri May 20 11:12:12 2022] pci 0000:07:00.0: Max Payload Size set to 128 (was 512, max 512)
[Fri May 20 11:12:12 2022] pci 0000:07:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:02.6 (capable of 63.012 Gb/s with 16.0 GT/s PCIe x4 link)
dc07fff 64bit]
[Fri May 20 11:12:12 2022] nvme nvme0: pci function 0000:07:00.0
[Fri May 20 11:12:12 2022] nvme 0000:07:00.0: enabling device (0000 -> 0002)
[Fri May 20 11:12:42 2022] nvme nvme0: Device not ready; aborting initialisation, CSTS=0x2
[Fri May 20 11:12:42 2022] nvme nvme0: Removing after probe failure status: -19
As a quick test, I also removed all VFs and added a whole 7.2TB drive to a guest - it shown immediately , without rebooting guest etc.
ubuntu@test01:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 61.9M 1 loop /snap/core20/1434
loop1 7:1 0 44.7M 1 loop /snap/snapd/15534
loop2 7:2 0 79.9M 1 loop /snap/lxd/22923
sr0 11:0 1 368K 0 rom
vda 252:0 0 10G 0 disk
โโvda1 252:1 0 9.9G 0 part /
โโvda14 252:14 0 4M 0 part
โโvda15 252:15 0 106M 0 part /boot/efi
nvme0n1 259:1 0 7T 0 disk
I am running out of ideas - I can try to ask my supplier to contact samsung
@0xabu I also tried attaching namespace to secondary controller only (which I first brought on-line)
nvme delete-ns /dev/nvme3n1
nvme create-ns /dev/nvme3 -b 4096 -s 1073741824 -c 1073741824
then - as you mentioned - we do not see it on the host
root@power01:/home/ubuntu# nvme attach-ns -n1 -c0x0001 /dev/nvme3
attach-ns: Success, nsid:1
root@power01:/home/ubuntu# nvme list -v
Subsystem Subsystem-NQN Controllers
---------------- ------------------------------------------------------------------------------------------------ ----------------
nvme-subsys3 nqn.1994-11.com.samsung:nvme:PM1733:2.5-inch:S546NE0N602916 nvme3
nvme-subsys2 nqn.2016-08.com.micron:nvme:nvm-subsystem-sn-21162E8B4C8B nvme2
nvme-subsys1 nqn.2016-08.com.micron:nvme:nvm-subsystem-sn-2135312AD01B nvme1
nvme-subsys0 nqn.2016-08.com.micron:nvme:nvm-subsystem-sn-2135312ACF69 nvme0
Device SN MN FR TxPort Address Subsystem Namespaces
-------- -------------------- ---------------------------------------- -------- ------ -------------- ------------ ----------------
nvme3 S546NE0N602916 SAMSUNG MZWLJ7T6HALA-00007 EPK98B5Q pcie 0000:c4:00.0 nvme-subsys3
nvme2 21162E8B4C8B Micron_9300_MTFDHAL3T2TDR 11300DU0 pcie 0000:c3:00.0 nvme-subsys2 nvme2n1
nvme1 2135312AD01B Micron_9300_MTFDHAL3T2TDR 11300DU0 pcie 0000:c2:00.0 nvme-subsys1 nvme1n1
nvme0 2135312ACF69 Micron_9300_MTFDHAL3T2TDR 11300DU0 pcie 0000:c1:00.0 nvme-subsys0 nvme0n1
but when adding PCI device to guest I have the same device not ready
error
May 20 13:29:43 test01 kernel: [ 7824.098540] pci 0000:07:00.0: [144d:a824] type 00 class 0x010802
May 20 13:29:43 test01 kernel: [ 7824.098650] pci 0000:07:00.0: reg 0x10: [mem 0x00000000-0x00007fff 64bit]
May 20 13:29:43 test01 kernel: [ 7824.098905] pci 0000:07:00.0: Max Payload Size set to 128 (was 512, max 512)
May 20 13:29:43 test01 kernel: [ 7824.100214] pci 0000:07:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:02.6 (capable of 63.012 Gb/s with 16.0 GT/s PCIe x4 link)
May 20 13:29:43 test01 kernel: [ 7824.104907] pci 0000:07:00.0: BAR 0: assigned [mem 0xfdc00000-0xfdc07fff 64bit]
May 20 13:29:43 test01 kernel: [ 7824.106971] nvme nvme0: pci function 0000:07:00.0
May 20 13:29:43 test01 kernel: [ 7824.107000] nvme 0000:07:00.0: enabling device (0000 -> 0002)
[ 7854.744864] nvme nvme0: Device not ready; aborting initialisation, CSTS=0x2
May 20 13:30:14 test01 kernel: [ 7854.744864] nvme nvme0: Device not ready; aborting initialisation, CSTS=0x2
May 20 13:30:14 test01 kernel: [ 7854.746425] nvme nvme0: Removing after probe failure status: -19
regards
@piotrekz79 It seems that the server motherboard has the same fault, and I can only hope that Samsung will reply. ๐ญ
Hi, I also come across the same fault, Has Samsung replied?
@piotrekz79 It seems that the server motherboard has the same fault, and I can only hope that Samsung will reply. ๐ญ
@iaGuoZhi No ๐ญ
@0xabu Hi, recently I got a PM1735 to play with. It seems that I have brought online a secondary controller, but I failed to expose it to VM. Could you please share something about how you pass the VF to the VM?
First, I try to enable a secondary controller (controller 0x1), and the command seemed worked fine.
cd /sys/class/nvme/nvme0/device
sudo bash -c "sudo echo 32 > sriov_numvfs"
sudo nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -n 2 -a 8
sudo nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -n 2 -a 8
sudo nvme virt-mgmt /dev/nvme0 -c 1 -a 9
lspci | grep Non
5e:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
5e:00.1 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
...
5e:04.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
60:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [Optane]
sudo nvme list-secondary /dev/nvme0 | head
Identify Secondary Controller List:
NUMID : Number of Identifiers : 32
SCEntry[0 ]:
................
SCID : Secondary Controller Identifier : 0x0001
PCID : Primary Controller Identifier : 0x0041
SCS : Secondary Controller State : 0x0001 (Online)
VFN : Virtual Function Number : 0x0001
NVQ : Num VQ Flex Resources Assigned : 0x0002
NVI : Num VI Flex Resources Assigned : 0x0002
Then I tried to use use VFIO to pass the VF to VM
sudo modprobe vfio-pci
sudo bash -c 'echo 144d a824 > /sys/bus/pci/drivers/vfio-pci/new_id'
And I did get some vfio-pci device
dyy@r742:/sys/bus/pci/drivers/vfio-pci$ ls
0000:5e:00.1 0000:5e:00.6 0000:5e:01.3 0000:5e:02.0 0000:5e:02.5 0000:5e:03.2 0000:5e:03.7 new_id
0000:5e:00.2 0000:5e:00.7 0000:5e:01.4 0000:5e:02.1 0000:5e:02.6 0000:5e:03.3 0000:5e:04.0 remove_id
0000:5e:00.3 0000:5e:01.0 0000:5e:01.5 0000:5e:02.2 0000:5e:02.7 0000:5e:03.4 0000:60:00.0 uevent
0000:5e:00.4 0000:5e:01.1 0000:5e:01.6 0000:5e:02.3 0000:5e:03.0 0000:5e:03.5 bind unbind
0000:5e:00.5 0000:5e:01.2 0000:5e:01.7 0000:5e:02.4 0000:5e:03.1 0000:5e:03.6 module
Then I passed the VFIO device to qemu, using command like:
qemu/build/qemu-system-x86_64 \
-kernel ../guest/linux-5.15/arch/x86_64/boot/bzImage \
-cpu qemu64 -smp 2 \
-m 9G \
-initrd ../files/initramfs.cpio.gz \
-nographic \
-append "console=ttyS0 root=/dev/vda, nokaslr" \
-enable-kvm \
-netdev user,id=net0 -device virtio-net-pci,netdev=net0 \
-device vfio-pci,host=0000:5e:00.1
But after I excute the command above, the host would crash during VM boot and I have to cold reboot the server to make PX1735 available again. I think maybe secondary controller is different from the primary controller and shoud use different method to expose it. but I have no idea what I should do.
I have also tried to expose the VF to the host, I mean, bind the secondary controller as a normal PCI NVMe device.
sudo bash -c "echo -n 0000:5e:00.1 > /sys/bus/pci/drivers/nvme/bind"
But again the host server would crash and I have to recycle the power. Did I miss anything in the way of using SR-IOV?
@Yiyuan-Dong I no longer have access to the hardware, but note that I had a PM1735 (not 1733). If it helps, here are some excerpts from scripts I had written:
To create/populate a namespace:
NVME_DEV=/dev/nvme0
SIZE_GB=512
BLOCK_SIZE=4096
VFNID=0
# Get the ID of the primary (host) controller
HOST_CNTLID=$(nvme id-ctrl $NVME_DEV -o json | jq .cntlid)
# Get the ID of the secondary (virtual function) controller
VIRT_CNTLID=$(nvme list-secondary $NVME_DEV -o json | jq '."secondary-controllers"[]|select(."virtual-function-number"=='$((VFNID + 1))')."secondary-controller-identifier"')
SIZE_BLOCKS=$((SIZE_GB * 1000000000 / BLOCK_SIZE))
echo "Primary controller ID: $HOST_CNTLID"
echo "Secondary controller ID: $VIRT_CNTLID"
echo "Creating namespace with $SIZE_BLOCKS $BLOCK_SIZE-byte blocks..."
# Create the new namespace, and capture the output, which should be "create-ns: Success, created nsid:32"
out=$(nvme create-ns $NVME_DEV --nsze=$SIZE_BLOCKS --ncap=$SIZE_BLOCKS --block-size $BLOCK_SIZE)
NSID=${out#*created nsid:}
echo "Created namespace $NSID"
# Attach the namespace to the host
nvme attach-ns $NVME_DEV -n $NSID -c $HOST_CNTLID
# Wait for the namespace to populate
while true; do
NSDEV=$(nvme list -o json | jq -r ".Devices[]|select(.NameSpace==$NSID).DevicePath")
if [ -n "$NSDEV" ]; then
break
fi
sleep 1
nvme ns-rescan $NVME_DEV
done
# Partition/format/image the new namespace
(...)
# Detach the new namespace from the primary controller
nvme detach-ns $NVME_DEV -c $HOST_CNTLID -n $NSID
# Attach it to the secondary controller
nvme attach-ns $NVME_DEV -c $VIRT_CNTLID -n $NSID
Booting a guest VM:
local sysfsdir=/sys/bus/pci/devices/$NVME_PF
local numvfs
read numvfs < $sysfsdir/sriov_numvfs
if [ $numvfs -eq 0 ]; then
# XXX: assign VQ & VI resources for all the controllers we might need, before enabling any
# (if we assign these resources later, then the command to online the secondary fails)
local vq_max=$(nvme primary-ctrl-caps $nvme_dev -o json | jq .vqfrsm)
local vi_max=$(nvme primary-ctrl-caps $nvme_dev -o json | jq .vifrsm)
local cid
for cid in $(nvme list-secondary $nvme_dev -o json | jq '."secondary-controllers"[]|select(."virtual-function-number" <= 4)."secondary-controller-identifier"'); do
nvme virt-mgmt $nvme_dev -c $cid -r 0 -n $vq_max -a 8 > /dev/null
nvme virt-mgmt $nvme_dev -c $cid -r 1 -n $vi_max -a 8 > /dev/null
done
# prevent probing of virtual function drivers, then create all the VFs
echo -n 0 > $sysfsdir/sriov_drivers_autoprobe
cat $sysfsdir/sriov_totalvfs > $sysfsdir/sriov_numvfs
fi
# Bring the secondary controller online
local cid=$(nvme list-secondary $nvme_dev -o json | jq '."secondary-controllers"[]|select(."virtual-function-number"=='$(($2 + 1))')."secondary-controller-identifier"')
nvme virt-mgmt $nvme_dev -c $cid -a 9 > /dev/null
# find the PCI ID of the VF
local vfnid=$(basename $(readlink $sysfsdir/virtfn$VFNID))
# bind to vfio-pci
echo vfio-pci > /sys/bus/pci/devices/$vfnid/driver_override
echo $vfnid >/sys/bus/pci/drivers_probe
# Now invoke QEMU, passing "-device vfio-pci,host=$vfnid"
These instructions look pretty awesome! I wonder if it would be a good idea to get these (maybe partially) added to blktests? Would this work against the soft target implementation of Linux? We certainly lack such complex tests...
@0xabu Thanks a lot for your help! Your scripts looks awesome, though I still met the same problem after using your scripts...
I'm wondering if it's a matter of kernel configuration or BIOS configuration. I have enabled SR-IOV and IOMMU in both BIOS and kernel configuration. Is there any other settings that must be take care of? Would you please share the kernel version that succeeded to bring up the drive?
@Yiyuan-Dong the guest kernel was Ubuntu 22.04 5.17.0-8. I believe the host was the same or similar, but don't recall for sure.
@0xabu Thank you so much for the speedy reply.
Update: After I moved the drvie from a rackmount server to my PC, I found my PC would not crash after it try to bind the drive, and the kernel log showed that the nvme driver had some strange behavior. I'd like to put some related kernel log here, begging someone to know what happened, or to let somebody know that I'm having the same problem.
Now I use the following script to try to bind the first secondary controller to the host, since the VF should be treated as hot-plugged PCI devices in the kernel.
#!/bin/bash
nvme virt-mgmt /dev/nvme2 -c 65 -r 1 -a 1 -n 0
nvme virt-mgmt /dev/nvme2 -c 65 -r 0 -a 1 -n 0
sudo nvme reset /dev/nvme2
sudo nvme virt-mgmt /dev/nvme2 -c 1 -r 0 -n 9 -a 8
sudo nvme virt-mgmt /dev/nvme2 -c 1 -r 1 -n 9 -a 8
sudo bash -c "sudo echo 0 > /sys/bus/pci/devices/0000:07:00.0/sriov_drivers_autoprobe" # no autoprobe
sudo bash -c "sudo echo 32 > /sys/class/nvme/nvme2/device/sriov_numvfs" # enable VF
sudo nvme virt-mgmt /dev/nvme2 -c 1 -a 9
sudo nvme list-secondary /dev/nvme2 | head
vfnid=0000:07:00.1
echo nvme > /sys/bus/pci/devices/$vfnid/driver_override
echo $vfnid > /sys/bus/pci/drivers_probe
After I execute the scipt, the log shows that
[ 188.713739] nvme nvme4: pci function 0000:07:00.1
[ 188.714064] nvme 0000:07:00.1: enabling device (0000 -> 0002)
[ 216.443974] watchdog: BUG: soft lockup - CPU#4 stuck for 26s! [kworker/u40:7:570]
[ 216.443977] Modules linked in: snd_seq_dummy snd_hrtimer mlx4_ib ib_uverbs ib_core nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr sunrpc vfat fat intel_rapl_msr iTCO_wdt pmt_telemetry intel_pmc_bxt ee1004 pmt_class mei_hdcp iTCO_vendor_support intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass rapl intel_cstate intel_uncore eeepc_wmi asus_wmi sparse_keymap platform_profile pcspkr rfkill wmi_bmof snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_soc_hdac_hda snd_hda_codec_hdmi snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_realtek soundwire_bus snd_soc_core snd_hda_codec_generic
[ 216.443997] ledtrig_audio snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer i2c_i801 snd i2c_smbus soundcore mei_me mei idma64 mlx4_core joydev intel_pmt acpi_tad acpi_pad zram ip_tables i915 i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm r8169 nvme nvme_core ghash_clmulni_intel vmd wmi video pinctrl_alderlake fuse
[ 216.444009] CPU: 4 PID: 570 Comm: kworker/u40:7 Kdump: loaded Not tainted 5.16.12+ #9
[ 216.444011] Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0407 09/13/2021
[ 216.444011] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[ 216.444015] RIP: 0010:pci_mmcfg_read+0xac/0xd0
[ 216.444018] Code: 5d 41 5c 41 5d 41 5e 41 5f c3 4c 01 e0 66 8b 00 0f b7 c0 89 45 00 eb e0 4c 01 e0 8a 00 0f b6 c0 89 45 00 eb d3 4c 01 e0 8b 00 <89> 45 00 eb c9 e8 2a 4c 55 ff 5b c7 45 00 ff ff ff ff b8 ea ff ff
[ 216.444019] RSP: 0018:ffffb3e701157c88 EFLAGS: 00000286
[ 216.444020] RAX: 00000000ffffffff RBX: 0000000000701000 RCX: 0000000000000ffc
[ 216.444020] RDX: 00000000000000ff RSI: 0000000000000007 RDI: 0000000000000000
[ 216.444021] RBP: ffffb3e701157cc4 R08: 0000000000000004 R09: ffffb3e701157cc4
[ 216.444021] R10: ffffb3e701157b18 R11: 0000000000000007 R12: 0000000000000ffc
[ 216.444022] R13: 0000000000001000 R14: 0000000000000004 R15: 0000000000000000
[ 216.444022] FS: 0000000000000000(0000) GS:ffff93092f300000(0000) knlGS:0000000000000000
[ 216.444023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 216.444023] CR2: 00007f09bdb3e4e0 CR3: 000000044e810002 CR4: 0000000000770ee0
[ 216.444024] PKRU: 55555554
[ 216.444024] Call Trace:
[ 216.444025] <TASK>
[ 216.444026] pci_bus_read_config_dword+0x36/0x50
[ 216.444029] pci_find_next_ext_capability.part.0.cold+0x87/0x93
[ 216.444031] pci_save_vc_state+0x25/0x90
[ 216.444032] pci_save_state+0x106/0x280
[ 216.444034] nvme_reset_work+0x313/0x12a0 [nvme]
[ 216.444036] ? resched_curr+0x20/0xb0
[ 216.444038] ? check_preempt_curr+0x2f/0x70
[ 216.444039] ? ttwu_do_wakeup+0x17/0x160
[ 216.444040] ? _raw_spin_unlock_irqrestore+0x25/0x40
[ 216.444042] ? try_to_wake_up+0x84/0x570
[ 216.444043] process_one_work+0x1e5/0x3c0
[ 216.444045] worker_thread+0x50/0x3b0
[ 216.444046] ? rescuer_thread+0x370/0x370
[ 216.444047] kthread+0x169/0x190
[ 216.444048] ? set_kthread_struct+0x40/0x40
[ 216.444048] ret_from_fork+0x1f/0x30
[ 216.444051] </TASK>
[ 244.443980] watchdog: BUG: soft lockup - CPU#4 stuck for 52s! [kworker/u40:7:570]
[ 244.443981] Modules linked in: snd_seq_dummy snd_hrtimer mlx4_ib ib_uverbs ib_core nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr sunrpc vfat fat intel_rapl_msr iTCO_wdt pmt_telemetry intel_pmc_bxt ee1004 pmt_class mei_hdcp iTCO_vendor_support intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass rapl intel_cstate intel_uncore eeepc_wmi asus_wmi sparse_keymap platform_profile pcspkr rfkill wmi_bmof snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_soc_hdac_hda snd_hda_codec_hdmi snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_realtek soundwire_bus snd_soc_core snd_hda_codec_generic
[ 244.444003] ledtrig_audio snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer i2c_i801 snd i2c_smbus soundcore mei_me mei idma64 mlx4_core joydev intel_pmt acpi_tad acpi_pad zram ip_tables i915 i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm r8169 nvme nvme_core ghash_clmulni_intel vmd wmi video pinctrl_alderlake fuse
[ 244.444016] CPU: 4 PID: 570 Comm: kworker/u40:7 Kdump: loaded Tainted: G L 5.16.12+ #9
[ 244.444017] Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0407 09/13/2021
[ 244.444017] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[ 244.444019] RIP: 0010:pci_mmcfg_read+0xac/0xd0
[ 244.444021] Code: 5d 41 5c 41 5d 41 5e 41 5f c3 4c 01 e0 66 8b 00 0f b7 c0 89 45 00 eb e0 4c 01 e0 8a 00 0f b6 c0 89 45 00 eb d3 4c 01 e0 8b 00 <89> 45 00 eb c9 e8 2a 4c 55 ff 5b c7 45 00 ff ff ff ff b8 ea ff ff
[ 244.444021] RSP: 0018:ffffb3e701157c88 EFLAGS: 00000286
[ 244.444022] RAX: 00000000ffffffff RBX: 0000000000701000 RCX: 0000000000000ffc
[ 244.444023] RDX: 00000000000000ff RSI: 0000000000000007 RDI: 0000000000000000
[ 244.444023] RBP: ffffb3e701157cc4 R08: 0000000000000004 R09: ffffb3e701157cc4
[ 244.444024] R10: ffffb3e701157b18 R11: 0000000000000007 R12: 0000000000000ffc
[ 244.444024] R13: 0000000000001000 R14: 0000000000000004 R15: 0000000000000000
[ 244.444024] FS: 0000000000000000(0000) GS:ffff93092f300000(0000) knlGS:0000000000000000
[ 244.444025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 244.444025] CR2: 00007f09bdb3e4e0 CR3: 000000044e810002 CR4: 0000000000770ee0
[ 244.444026] PKRU: 55555554
[ 244.444026] Call Trace:
[ 244.444027] <TASK>
[ 244.444027] pci_bus_read_config_dword+0x36/0x50
[ 244.444029] pci_find_next_ext_capability.part.0.cold+0x87/0x93
[ 244.444030] pci_save_vc_state+0x25/0x90
[ 244.444031] pci_save_state+0x106/0x280
[ 244.444033] nvme_reset_work+0x313/0x12a0 [nvme]
[ 244.444036] ? resched_curr+0x20/0xb0
[ 244.444037] ? check_preempt_curr+0x2f/0x70
[ 244.444038] ? ttwu_do_wakeup+0x17/0x160
[ 244.444039] ? _raw_spin_unlock_irqrestore+0x25/0x40
[ 244.444040] ? try_to_wake_up+0x84/0x570
[ 244.444042] process_one_work+0x1e5/0x3c0
[ 244.444043] worker_thread+0x50/0x3b0
[ 244.444044] ? rescuer_thread+0x370/0x370
[ 244.444045] kthread+0x169/0x190
[ 244.444045] ? set_kthread_struct+0x40/0x40
[ 244.444046] ret_from_fork+0x1f/0x30
[ 244.444048] </TASK>
[ 245.203988] nvme nvme4: Removing after probe failure status: -19
[ 245.303991] pcieport 0000:00:1c.4: AER: Uncorrected (Non-Fatal) error received: 0000:07:00.1
[ 245.463974] pcieport 0000:00:1c.4: AER: Uncorrected (Non-Fatal) error received: 0000:07:00.1
And immediately after I run the script, there is a new device nvme4
appears under /dev
. After the log says Removing after probe failure status: -19, nvme4
disappers.
The Linux kernel I use is 5.16.12
OK, it seems that the problem is all about the fireware. After I report the problem to the after-sales, he provided me with the latest PM173X firmware EPK9GB5Q. Now everything works fine with the latest firmware.