Open eiclpy opened 1 year ago
Hi, I have the same problem. Have you solved it?
@fyu1 @davejiang looks like this is caused by iommu not getting enabled. Is there any BIOS or other configuration that needs to be changed from default? Following is the cpu and kernel. CPU: Intel(R) Xeon(R) Gold 6448H Kernel: 6.2.0-34-generic
PRI is not enabled and thus SVA cannot be enabled: Capabilities: [240 v1] Page Request Interface (PRI) PRICtl: Enable- Reset-
Could you remove "iommu=pt" in the kernel option?
Did somebody solve this issue? I am just using intel_iommu=on,sm_on
without iommu=pt
as kernel options, but still get the warning and disabled PRI/pasid.
The IAA User Guide specifies that DSA/IAX option should be enabled under IOAT in the BIOS. As far as I can tell that is related to the I/O acceleration technology DMA engine, so perhaps this is relevant (?). As I am using a Dell motherboard/BIOS however, there does not seem to be an equivalent to this option.
@Jonas-Heinrich can you give more details of the CPU/Platform, what you are trying to do and what errors are you getting?
@fyu1 do you know what is going on?
I have the same problem as @Jonas-Heinrich .
Here is the output of inxi -F
@ramesh-thomas
System:
Host: xxx Kernel: 6.2.0-37-generic x86_64 bits: 64
Console: pty pts/1 Distro: Ubuntu 22.04.3 LTS (Jammy Jellyfish)
Machine:
Type: Unknown System: Supermicro product: Super Server v: 0123456789
serial: 0123456789
Mobo: Supermicro model: X13DEI v: 1.01 serial: xxxxxxxxx
UEFI: American Megatrends LLC. v: 1.4 date: 08/09/2023
CPU:
Info: 32-core model: Intel Xeon Gold 6448H bits: 64 type: MT MCP cache:
L2: 64 MiB
Speed (MHz): avg: 1247 min/max: 800/4100 cores: 1: 800 2: 800 3: 800
4: 804 5: 800 6: 3808 7: 3100 8: 800 9: 800 10: 800 11: 800 12: 800 13: 800
14: 2600 15: 2406 16: 2600 17: 800 18: 2600 19: 2400 20: 800 21: 800
22: 800 23: 800 24: 800 25: 800 26: 800 27: 800 28: 800 29: 800 30: 800
31: 800 32: 800 33: 800 34: 800 35: 2600 36: 800 37: 2500 38: 800 39: 800
40: 1639 41: 800 42: 800 43: 2000 44: 800 45: 800 46: 2400 47: 800
48: 2400 49: 1700 50: 800 51: 800 52: 800 53: 800 54: 800 55: 800
56: 4100 57: 800 58: 800 59: 800 60: 800 61: 800 62: 800 63: 2600 64: 800
Graphics:
Device-1: ASPEED Graphics Family driver: ast v: kernel
Display: server: No display server data found. Headless machine?
Message: GL data unavailable for root.
Audio:
Message: No device data found.
Network:
Device-1: Mellanox MT43244 BlueField-3 integrated ConnectX-7 network
driver: mlx5_core
IF: ibs2f0 state: down
mac: 00:00:06:36:fe:80:00:00:00:00:00:00:a0:88:c2:03:00:80:ee:76
Device-2: Mellanox MT43244 BlueField-3 integrated ConnectX-7 network
driver: mlx5_core
IF: ibs2f1 state: down
mac: 00:00:05:95:fe:80:00:00:00:00:00:00:a0:88:c2:03:00:80:ee:77
Device-3: Broadcom NetXtreme BCM5720 Gigabit Ethernet PCIe driver: tg3
IF: eno1 state: up speed: 1000 Mbps duplex: full mac: 7c:c2:55:80:d4:ac
Device-4: Broadcom NetXtreme BCM5720 Gigabit Ethernet PCIe driver: tg3
IF: eno2 state: down mac: 7c:c2:55:80:d4:ad
IF-ID-1: docker0 state: down mac: 02:42:f6:b2:88:a9
IF-ID-2: enxbe3af2b6059f state: down mac: be:3a:f2:b6:05:9f
IF-ID-3: tmfifo_net0 state: unknown speed: 10000 Mbps duplex: full
mac: 00:1a:ca:ff:ff:02
Bluetooth:
Device-1: Insyde RNDIS/Ethernet Gadget type: USB driver: rndis_host
Report: This feature requires one of these tools: hciconfig/bt-adapter
Drives:
Local Storage: total: 1.82 TiB used: 789.39 GiB (42.4%)
ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 990 PRO 2TB size: 1.82 TiB
Partition:
ID-1: / size: 1.79 TiB used: 788.75 GiB (43.1%) fs: ext4 dev: /dev/dm-0
ID-2: /boot size: 1.9 GiB used: 641.5 MiB (33.0%) fs: ext4
dev: /dev/nvme0n1p2
ID-3: /boot/efi size: 1.05 GiB used: 6.1 MiB (0.6%) fs: vfat
dev: /dev/nvme0n1p1
Swap:
ID-1: swap-1 type: file size: 8 GiB used: 0 KiB (0.0%) file: /swap.img
Sensors:
Message: No ipmi sensor data found.
System Temperatures: lm-sensors cpu: 23.0 0.0 mobo: N/A sodimm: DIMM 0.0
Fan Speeds (RPM): lm-sensors N/A
Info:
Processes: 946 Uptime: 12m Memory: 125.48 GiB used: 2.49 GiB (2.0%)
Shell: Sudo inxi: 3.3.13
@ramesh-thomas @fyu1 I sent you an email from my TU Munich (TUM) address with more details, there's some parts that I cannot share here.
I got it working by setting "EDKII Menu -> Socket Configuration -> IIO Configuration -> Opt-Out Illegal MSI Mitigation: Enable" in the BIOS of another machine. As I had to find out, this option is not available on e.g. Dell motherboards (to the best of my knowledge).
@eiclpy @Jonas-Heinrich please work with @fyu1 to get this issue resolved and update this issue with your findings. Thanks.
Hi,I have the same problem.I ran the command as the dsa user guide,but something is not right.Here is my information.Thanks!@ramesh-thomas @fyu1
CPU:Intel(R) Xeon(R) Platinum 8475BL System: Ubuntu 22.04 kernel:https://gitee.com/anolis/anck-next/tree/devel-6.1
$ accel-config -v
4.1.3.git71676025
$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.1.27+ root=UUID=424deffb-478e-4611-834c-609e9ba58e75 ro vga=792 console=tty0 console=ttyS0,115200n8 net.ifnames=0 noibrs crashkernel=0M-1G:0M,1G-4G:192M,4G-128G:384M,128G-:512M nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295 intel_iommu=on,sm_on
$ sudo dmesg | grep idxd
[ 5.275363] idxd 0000:00:06.0: Unable to turn on user SVA feature.
[ 5.279597] idxd 0000:00:06.0: Failed to initialize perfmon. No PMU support: -19
[ 5.288915] idxd 0000:00:06.0: Intel(R) Accelerator Device (v0)
[ 5.289087] idxd 0000:00:07.0: Unable to turn on user SVA feature.
[ 5.290949] idxd 0000:00:07.0: Failed to initialize perfmon. No PMU support: -19
[ 5.292351] idxd 0000:00:07.0: Intel(R) Accelerator Device (v0)
$ sudo dmesg | grep iommu
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.1.27+ root=UUID=424deffb-478e-4611-834c-609e9ba58e75 ro vga=792 console=tty0 console=ttyS0,115200n8 net.ifnames=0 noibrs crashkernel=0M-1G:0M,1G-4G:192M,4G-128G:384M,128G-:512M nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295 intel_iommu=on,sm_on
[ 0.022942] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.1.27+ root=UUID=424deffb-478e-4611-834c-609e9ba58e75 ro vga=792 console=tty0 console=ttyS0,115200n8 net.ifnames=0 noibrs crashkernel=0M-1G:0M,1G-4G:192M,4G-128G:384M,128G-:512M nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295 intel_iommu=on,sm_on
[ 0.629394] iommu: Default domain type: Passthrough
$ sudo lspci -vvv -s 00:06.0
00:06.0 System peripheral: Intel Corporation Device 0b25
Subsystem: Intel Corporation Device 2010
Physical Slot: 6
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Region 0: Memory at fa248000 (64-bit, prefetchable) [size=8K]
Region 2: Memory at fa24a000 (64-bit, prefetchable) [size=8K]
Expansion ROM at fea12000 [disabled] [size=2K]
Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
Vector table: BAR=0 offset=00000600
PBA: BAR=0 offset=00000000
Capabilities: [50] Express (v2) Root Complex Integrated Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE- FLReset-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
Kernel driver in use: idxd
Kernel modules: idxd
(I lost many capabilities compared with others)
$ cat /sys/bus/dsa/devices/dsa0/pasid_enabled
0
$accel-config load-config -c contrib/configs/app_profile.conf -e
dsa0 is active. Skipping...
(why i can't config my dsa device)
"dsa0 is active. Skipping..."
The device was already enabled. You can give the -f option to disable enabled devices.
I get an error instead of a warning when I configure the device. @fyu1 @ramesh-thomas
$ sudo dmesg | grep iommu
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.2.0-37-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro intel_iommu=on,sm_on no5lvl
[ 0.171459] Kernel command line: BOOT_IMAGE=/vmlinuz-6.2.0-37-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro intel_iommu=on,sm_on no5lvl
[ 2.779978] iommu: Default domain type: Translated
[ 2.779978] iommu: DMA domain TLB invalidation policy: lazy mode
$ accel-config -v
4.1.3+
$ sudo accel-config load-config -c contrib/configs/app_profile.conf -e
Enabling device dsa0
Error enabling device
Error[0x800c0000] dsa0: wq error - no shared wq support (platform configuration error)
Thank you!I have disabled the dsa0, and run the command again.But I get a new error message.@ramesh-thomas(I am root)
$ accel-config load-config -c contrib/configs/app_profile.conf -e
libaccfg: accfg_device_set_read_buffer_limit: dsa0: write failed: Operation not permitted
device set read_buffer_limit value failed
Parse json and set device fail: -1
$ ls /sys/bus/dsa/devices/dsa0
cdev_major engine0.1 group0.0 max_tokens op_cap subsystem
clients engine0.2 max_batch_size max_transfer_size pasid_enabled token_limit
cmd_status engine0.3 max_engines max_work_queues power uevent
configurable errors max_groups max_work_queues_size read_buffer_limit version
engine0.0 gen_cap max_read_buffers numa_node state wq0.0
I had the exact same problem as you, did you solve it? @eiclpy
Does anyone have any update on fixing this. I also have the very same issue of PRICtl being disabled and it shows "Unable to turn on SVA feature" when I boot up.
For anybody who has seen 'SVM disabled, incompatible paging mode' in the kernel message, try disable 5-level pagetable and reboot. (add 'no5lvl' to your kernel command line)
I rebooted the system with "intel_iommu=on,sm_on" but the "Unable to turn on user SVA function" is still there.
CPU: Intel(R) Xeon(R) Gold 6448H System: Ubuntu 22.04 Kernel: 6.2.0-34-generic