snu-csl / nvmevirt

NVMeVirt: A Versatile Software-defined Virtual NVMe Device
Other
182 stars 54 forks source link

Error inserting ZNS SSD kernel module #6

Closed zy1024cs closed 1 year ago

zy1024cs commented 1 year ago

Hello, when I tried to enable CONFIG_NVMEVIRT_ZNS := y in the Makefile file and compiled it, the following error occurred when inserting the compiled file into the kernel module: image My /etc/default/grub file settings are as follows: image May I ask what may be the reason? thank you

arter97 commented 1 year ago

sudo dmesg (kernel log) will show more information.

Please post its output as well.

zy1024cs commented 1 year ago

If memmap_size is set to a multiple of 1024, such as 8192M, an errors may occur in zns_ftl.c. The details are as follows: image Because capacity has been reduced by 1M in file main.c (The details are as follows), resulting in a value of 8191M. image When I try to set memmap_size to 8193M (8192+1) or 4097M (4096+1), this error will disappear, but another error will occur during insmod, like image Sometimes the system crashes directly, which is actually quite mysterious (πŸ˜‚οΌŒIt feels like a bit of luck running) Can you provide a suggested parameter that can make ZNS run? (I have less available hardware, only 10GB of allocatable memory and 8 CPU cores πŸ˜…) thank you

euidong-lee commented 1 year ago

In your dmesg, It shows "ns_size=16384MiB". It means that you set memmap_size to 16385M(16384 + 1). Since your machine has only 10GB memory, this may cause issues. Can you check memmap_size?

zy1024cs commented 1 year ago

Thank you for your reply. I previously conducted experiments on the server (with 32GB of available memory), but due to the possibility of system crashes during each experiment and the hassle of restarting the server in the computer room, I am currently using a virtual machine on my laptop (with only 10GB of memory allocated to the virtual machine) for the experiment. The above image shows my experiment on the server, so it was implemented with 32GB of available memory. Now I can run on my own virtual machine, but I still feel lucky to run normally because the system often crashes after insmod. The server is still unable to run normally.

Below are some possible bugs that I have encountered after insmod in the virtual machine (please forgive me for some content that I cannot understand πŸ˜…) image image image image Due to the existence of some errors, it is possible that after using fio testing or using some simple shells, the built zns will crash and crash along with the system, ultimately causing the system to crash. (A small question, dear brother, you also use fio for testing ZnS, right? Can you share the fio command you used for testing? Or share some other possible help? Thank you.πŸ˜‹οΌ‰

euidong-lee commented 1 year ago

Thanks to your help, we find memory corruption when accessing the struct zone_report. Could you modify the code as follows and try it? fixed_code

We just use 'zonemode=zbd' option for fio. The other options are the same as for testing conventional ssds. Please note that the write unit size for ZNS_PROTOTYPE is set to 128KB, due to the constraints of the prototype SSD that we used. So, the write request should be a multiple of 128KB. Also, write buffering is not supported in this prototype SSD, which means that write latency exposes the NAND program time.

arter97 commented 1 year ago

@zy1024cs, please try out #7.

zy1024cs commented 1 year ago

Thank you very much. With the new code, I can successfully install the zns mode on my server (πŸ˜€). But it only supports read mode, is write mode still being debugged? image image Looking forward to supporting write mode (\ο»Ώ (β€’β—‘β€’) /). Thank you again

arter97 commented 1 year ago

Linux kernel doesn't support ZNS without Zone Append capability.

Read this thread for context: https://lore.kernel.org/all/20200818052936.10995-1-joshi.k@samsung.com/T/#u

If you don't use ZA, you can probably just fake NVMeVirt to report it's supported to make it R/W. Try uncommenting this line: https://github.com/snu-csl/nvmevirt/pull/7/commits/96f21b9936f88800c2ab85e6eec2f824ccd5779a#diff-c714e0d9e04cd3a91b0e2de1c353296a3550544d3c5c63ff18f129f974beaa25R224

Ideally, it'd be nice if NVMeVirt adds support for ZA :upside_down_face:

arter97 commented 1 year ago

I'm guessing that the authors inadvertently reported ZA capability to the host by masking the nvme_effects_log's page with 0xFFFFFFFF.

If that's not the case, an elaboration on how to run fio on ZNS would be appreciated.

Linux kernel's NVMe code didn't support ZNS with no ZA capabilities from the start: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/zns.c?id=240e6ee272c07a2636dfc7d65f5bbb18377c49e5#n47

euidong-lee commented 1 year ago

@arter97, thank you for fixing the get_log_page function. As you mentioned, we implemented the get_log_page function in a simplistic manner by masking the "nvme_effects_log" page with 0xFFFFFFFF to enable running ZNS in the kernel. We are planning to add support for ZA soon. Until it is officially added, please use a version of NVMeVirt that fakes support for ZA.