snu-csl / nvmevirt

NVMeVirt: A Versatile Software-defined Virtual NVMe Device
Other
182 stars 54 forks source link

Virtual device is removed after probe failure. #15

Closed juimdpp closed 1 year ago

juimdpp commented 1 year ago

Hello, I can't manage to use nvmevirt because the Virtual NVMe device is removed after its creation by nvmevirt. Here is the kernel log (sudo dmesg):

[  159.230037] nvmev: loading out-of-tree module taints kernel.
[  159.230088] nvmev: module verification failed: signature and/or required key missing - tainting kernel
[  159.230441] NVMeVirt: Storage : 100100000 + 3ff00000
[  159.231287] NVMeVirt: [NVMEV_NAMESPACE_INIT] ns=0 ns_addr=00000000507c09b5 ns_size=1023(MiB) 
[  159.231329] PCI host bridge to bus 0001:10
[  159.231331] pci_bus 0001:10: root bus resource [io  0x0000-0xffff]
[  159.231334] pci_bus 0001:10: root bus resource [mem 0x00000000-0xffffffffffff]
[  159.231335] pci_bus 0001:10: root bus resource [bus 00-ff]
[  159.231343] pci 0001:10:00.0: [0c51:0101] type 00 class 0x010802
[  159.231347] pci 0001:10:00.0: reg 0x10: [mem 0x100000000-0x100003fff 64bit]
[  159.231351] pci 0001:10:00.0: reg 0x30: [mem 0x00000000-0x000007ff pref]
[  159.231353] pci 0001:10:00.0: enabling Extended Tags
[  159.231409] pci 0001:10:00.0: Adding to iommu group 22
[  159.231751] NVMeVirt: Successfully created virtual PCI bus (node 0)
[  159.231993] NVMeVirt: nvmev_io_worker_0 started on cpu 4 (node 0)
[  159.232017] NVMeVirt: nvmev_dispatcher started on cpu 3 (node 0)
[  159.232119] nvme nvme1: pci function 0001:10:00.0
[  159.232142] NVMeVirt: Successfully created Virtual NVMe deivce
[  159.234347] BUG: unable to handle page fault for address: ffff8d2cffffe000
[  159.234352] #PF: supervisor read access in kernel mode
[  159.234355] #PF: error_code(0x0000) - not-present page
[  159.234358] PGD 80ce01067 P4D 80ce01067 PUD 0 
[  159.234363] Oops: 0000 [#1] SMP NOPTI
[  159.234367] CPU: 3 PID: 3521 Comm: nvmev_dispatche Tainted: G           OE     5.15.0-73-generic #80~20.04.1-Ubuntu
[  159.234372] Hardware name: Gigabyte Technology Co., Ltd. B550M DS3H/B550M DS3H, BIOS F16 11/09/2022
[  159.234374] RIP: 0010:nvmev_proc_admin_sq+0x7c/0xf20 [nvmev]
[  159.234384] Code: 89 f8 45 89 ce 48 8b 53 10 4d 63 da 45 89 d7 44 8b 63 0c 49 c1 eb 06 41 83 e7 3f 45 8d 42 01 4e 8b 2c da 49 c1 e7 06 4d 01 fd <41> 0f b6 55 00 80 fa 06 0f 84 75 06 00 00 0f 87 74 04 00 00 80 fa
[  159.234387] RSP: 0018:ffffb688c12ebe78 EFLAGS: 00010286
[  159.234391] RAX: ffff8d2e4c75e000 RBX: ffff8d2e4f0c8740 RCX: 0000000000000001
[  159.234393] RDX: ffff8d2e418be670 RSI: 0000000000000000 RDI: 0000000000000001
[  159.234396] RBP: ffffb688c12ebed0 R08: 0000000000000001 R09: 0000000000000000
[  159.234398] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  159.234400] R13: ffff8d2cffffe000 R14: 0000000000000000 R15: 0000000000000000
[  159.234402] FS:  0000000000000000(0000) GS:ffff8d351fac0000(0000) knlGS:0000000000000000
[  159.234405] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  159.234408] CR2: ffff8d2cffffe000 CR3: 000000015447e000 CR4: 0000000000750ee0
[  159.234410] PKRU: 55555554
[  159.234412] Call Trace:
[  159.234415]  <TASK>
[  159.234419]  nvmev_dispatcher+0x90/0x250 [nvmev]
[  159.234428]  ? __proc_file_read+0x240/0x240 [nvmev]
[  159.234434]  kthread+0x12a/0x150
[  159.234440]  ? set_kthread_struct+0x50/0x50
[  159.234444]  ret_from_fork+0x22/0x30
[  159.234451]  </TASK>
[  159.234452] Modules linked in: nvmev(OE) nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi edac_mce_amd nouveau snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm_amd snd_hda_codec snd_hda_core snd_hwdep kvm snd_pcm mxm_wmi drm_ttm_helper crct10dif_pclmul ghash_clmulni_intel snd_seq_midi ttm aesni_intel snd_seq_midi_event crypto_simd cryptd joydev input_leds snd_rawmidi drm_kms_helper binfmt_misc wmi_bmof rapl cec snd_seq rc_core i2c_algo_bit snd_seq_device gigabyte_wmi fb_sys_fops snd_timer syscopyarea sysfillrect k10temp snd sysimgblt ccp video soundcore mac_hid sch_fq_codel msr parport_pc ppdev drm lp parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid r8169 nvme xhci_pci gpio_amdpt ahci crc32_pclmul i2c_piix4 nvme_core realtek libahci xhci_pci_renesas wmi gpio_generic
[  159.234537] CR2: ffff8d2cffffe000
[  159.234539] ---[ end trace ad6c13d79058f9ea ]---
[  159.234542] RIP: 0010:nvmev_proc_admin_sq+0x7c/0xf20 [nvmev]
[  159.234549] Code: 89 f8 45 89 ce 48 8b 53 10 4d 63 da 45 89 d7 44 8b 63 0c 49 c1 eb 06 41 83 e7 3f 45 8d 42 01 4e 8b 2c da 49 c1 e7 06 4d 01 fd <41> 0f b6 55 00 80 fa 06 0f 84 75 06 00 00 0f 87 74 04 00 00 80 fa
[  159.234552] RSP: 0018:ffffb688c12ebe78 EFLAGS: 00010286
[  159.234555] RAX: ffff8d2e4c75e000 RBX: ffff8d2e4f0c8740 RCX: 0000000000000001
[  159.234558] RDX: ffff8d2e418be670 RSI: 0000000000000000 RDI: 0000000000000001
[  159.234560] RBP: ffffb688c12ebed0 R08: 0000000000000001 R09: 0000000000000000
[  159.234562] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  159.234564] R13: ffff8d2cffffe000 R14: 0000000000000000 R15: 0000000000000000
[  159.234567] FS:  0000000000000000(0000) GS:ffff8d351fac0000(0000) knlGS:0000000000000000
[  159.234570] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  159.234572] CR2: ffff8d2cffffe000 CR3: 000000015447e000 CR4: 0000000000750ee0
[  159.234575] PKRU: 55555554
[  220.657584] nvme nvme1: I/O 8 QID 0 timeout, disable controller
[  220.765576] nvme nvme1: Device shutdown incomplete; abort shutdown
[  220.793639] nvme nvme1: Identify Controller failed (-4)
[  220.793642] nvme nvme1: Removing after probe failure status: -5

I followed the instructions from README (didn't even change Makefile configurations) and am using Ubuntu 20.04.1 with Linux kernel version 5.15.0-73-generic. I reserved 1G of memory from 4G (GRUB_CMDLINE_LINUX="memmap=1G\\\$4G"). As you can see, a Virtual NVMe device is successfully created but then removed after probe fails with a -5 status. insmod takes quite a long time (a little over a minute I think), and I found out that the nvme_dispatcher does not end properly. I tried diverse memory mappings, but nothing worked. I even tried another version of Ubuntu (22.04 LTS - uses linux kernel version 5.19) but that didn't work either.

I would really appreciate it if anyone could help me. Thank you!

juimdpp commented 1 year ago

For anyone who has encountered this problem, make sure your IOMMU is disabled. The module works well once this is done! You can check this via the methods mentioned in this website. Personally for me, adding intremap=off to GRUB_CMDLINE_LINUX wasn't enough to turn the IOMMU off. I tried to disable it via BIOS, but this didn't work (the setting was reset every time the computer is turned on again). I finally was able to turn it off by adding amd_iommu=off to GRUB_CMDLINE_LINUX (if you're using AMD. Add intel_iommu=off if you're using Intel).