Closed fevenor closed 2 weeks ago
写到args里面
写到args里面
似乎将NPU直通给虚拟机会导致内核崩溃,放弃这个打算了。
[ 740.669811] Internal error: synchronous external abort: 0000000096000010 [#1] SMP
[ 740.678053] Modules linked in: nf_conntrack_netlink xt_nat xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat nf_tables overlay veth rpcsec_gss_krb5 ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp ip6_udp_tunnel udp_tunnel iptable_filter bridge stp llc bonding tls zstd zram zsmalloc nfnetlink_log nfnetlink binfmt_misc rk805_pwrkey pwm_fan nvmem_rockchip_otp panfrost drm_shmem_helper rockchip_cpuinfo gpu_sched uio_pdrv_genirq uio vhost_net tun vhost vhost_iotlb tap fuse dm_mod ip_tables ipv6
[ 740.735539] CPU: 3 PID: 2482 Comm: bash Not tainted 6.1.75-vendor-rk35xx #1
[ 740.743147] Hardware name: Turing Machines RK1 (DT)
[ 740.748481] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 740.756091] pc : readl+0x4/0x20
[ 740.759550] lr : rk_iommu_is_stall_active+0x60/0x68
[ 740.764895] sp : ffff80000cbc3890
[ 740.768528] x29: ffff80000cbc3890 x28: ffff000105e2bd00 x27: 0000000000000000
[ 740.776335] x26: 00000000fffffff0 x25: ffff8000080d45b0 x24: 0000000000000000
[ 740.784142] x23: ffff0001001260f4 x22: ffff000104552840 x21: 0000000000000001
[ 740.791948] x20: ffff000103c81e80 x19: 0000000000000001 x18: 0000000000000000
[ 740.799754] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaaaddde7f30
[ 740.807544] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 740.815300] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000087d8554
[ 740.823055] x8 : 0000000000000000 x7 : ffff80000a4e1520 x6 : 000000000002d0c4
[ 740.830811] x5 : 0000000423882d79 x4 : 0000000000000000 x3 : ffff000105e2bd00
[ 740.838566] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff80000b1f5004
[ 740.846324] Call trace:
[ 740.848990] readl+0x4/0x20
[ 740.852038] rk_iommu_enable_stall+0x11c/0x138
[ 740.856873] rk_iommu_enable+0x48/0x230
[ 740.861048] rk_iommu_resume+0x50/0x64
[ 740.865125] pm_generic_runtime_resume+0x30/0x44
[ 740.870149] __rpm_callback+0x4c/0x12c
[ 740.874227] rpm_callback+0x78/0x7c
[ 740.878022] rpm_resume+0x3b0/0x44c
[ 740.881818] __pm_runtime_resume+0x74/0x9c
[ 740.886272] rpm_get_suppliers+0x50/0xc0
[ 740.890542] __rpm_callback+0xa4/0x12c
[ 740.894619] rpm_callback+0x78/0x7c
[ 740.898413] rpm_resume+0x3b0/0x44c
[ 740.902208] __pm_runtime_resume+0x74/0x9c
[ 740.906662] pm_runtime_get_sync.isra.0+0x14/0x20
[ 740.911783] device_release_driver_internal+0x4c/0x150
[ 740.917379] device_driver_detach+0x20/0x2c
[ 740.921933] unbind_store+0x60/0x90
[ 740.925729] drv_attr_store+0x30/0x44
[ 740.929719] sysfs_kf_write+0x44/0x58
[ 740.933714] kernfs_fop_write_iter+0xc0/0x178
[ 740.938453] vfs_write+0x154/0x1b8
[ 740.942161] ksys_write+0x78/0xe4
[ 740.945772] __arm64_sys_write+0x20/0x2c
[ 740.950043] invoke_syscall+0x8c/0x128
[ 740.954124] el0_svc_common.constprop.0+0xd8/0x128
[ 740.959338] do_el0_svc+0xac/0xbc
[ 740.962947] el0_svc+0x2c/0x54
[ 740.966279] el0t_64_sync_handler+0xac/0x13c
[ 740.970931] el0t_64_sync+0x19c/0x1a0
可行的替代方案:使用lxc容器
在配置文件中,添加NPU设备renderD129
及其相关的card1
lxc.apparmor.profile: unconfined
lxc.cgroup.devices.allow: a
lxc.cap.drop:
lxc.cgroup2.devices.allow: c 226:1 rwm
lxc.cgroup2.devices.allow: c 226:129 rwm
lxc.mount.entry: /dev/dri/card1 dev/dri/card1 none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD129 dev/dri/renderD129 none bind,optional,create=file
已阅读Resource_PassThrough,了解到PCIE直通目前无法实现。 进一步阅读VFIO details,了解到对于ARM平台的QEMU,可使用以下参数直通
VFIO_PLATFORM
。而对于Proxmox,其
qm
命令并没有此类参数。在Proxmox中直连Rockchip平台的NPU设备是否仍有可能?