rust-vmm / vmm-reference

A VMM implementation based of rust-vmm components
Apache License 2.0
147 stars 61 forks source link

Investigate long boot wait time in block device test #46

Closed alexandruag closed 3 years ago

alexandruag commented 3 years ago

Figure out why the (bzimage-focal, rootfs.ext4) combination takes too long to boot for test_reference_vmm_with_disk in tests/test_run_reference_vmm.py (see comment above KERNELS_DISK global).

gabhijit commented 3 years ago

I have been able to look at this (or actually ended up looking at this in the context of something else!). Here is what happens,

  1. When we use bzimage-focal image, the root filesystem is mounted as read-only. See below from the output of dmesg -
    ....
    [    0.622651] Loaded X.509 cert 'Build time autogenerated kernel key: e885ede42
    fd054dd93a39894f24e3d3fcc02822c'
    [    0.623279] zswap: default zpool zbud not available
    [    0.623617] zswap: pool creation failed
    [    0.623928] Key type ._fscrypt registered
    [    0.624196] Key type .fscrypt registered
    [    0.624485] Key type encrypted registered
    [    0.625917] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts: (null)
    [    0.626366] VFS: Mounted root (ext4 filesystem) readonly on device 254:0.
    ....

Consequently a number of processes that rely on being able to write to FS fail. systemd-logind, systemd-timesynd, systemd-resolved to name a few. Oddly enough the login shell is provided. Don't know why?

  1. When we use vmlinux-focal image, root FS is mounted read-write
 ...
[    0.647635] Loading compiled-in X.509 certificates
[    0.648463] Loaded X.509 cert 'Build time autogenerated kernel key: e885ede42
fd054dd93a39894f24e3d3fcc02822c'
[    0.649155] zswap: default zpool zbud not available
[    0.649485] zswap: pool creation failed
[    0.649797] Key type ._fscrypt registered
[    0.650084] Key type .fscrypt registered
[    0.650414] Key type encrypted registered
[    0.788979] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts: (null)
[    0.792337] VFS: Mounted root (ext4 filesystem) on device 254:0.
....

All the services are started properly and hence we quickly get a login shell.

The root cause appears to be the read-only mount of the root filesystem. Why that happens with bzimage, I don't know! :-(

gabhijit commented 3 years ago

I am able to figure out why the root file system is mounted read only. This has got to do with the flag in the header of the bzImage file. If you run file bzimage-focal command, you'll see that there is RO-RootFS there.

$ file bzimage-focal
bzimage-focal: Linux kernel x86 boot executable bzImage, version 5.4.81 (root@gabhijit-Lenovo-IdeaPad-S340-14IWL) #1 SMP Thu Feb 4 14:27:35 IST 2021, RO-rootFS, swap_dev 0x4, Normal VGA

The flag comes from the root_flags field of the boot_header documented here. Also it is documented here that this flag is deprecated, so there is no kernel config option (or at-least I couldn't find one) that could control this flag. Which means, one has to explicitly pass the ro/rw flag to the kernel command line.

FWIW, I tried to edit this flag using ghex bzimage-focal (made it 0 and thus RW), the root is mounted rw by default and the OS boots quite fast just like vmlinux.

I also tried editing the default kernel command line by giving rw flag for testing. Using following kernel command line i8042.nokbd reboot=t panic=1 pci=off rw. This just works as well.

Which means whenever we are specifying a bzimage kernel, we have to explicitly add this flag to kernel command line in addition to the current default flags. I believe, this can be fixed for this specific test case in the integration test itself, but whether to add rw to default command line or not needs a broader discussion.

I can add a fix to the test case by adding the rw flag if disk_path is specified (which is okay in the context of test-case I believe). I can do that as a part of #63 in a separate commit. Will that be fine?

lauralt commented 3 years ago

@gabhijit Yep, it would be great. Thanks for the thorough investigation! As for adding or not rw to the default cmd line, I would vote for adding it and let the device decide how the root will be mounted (for now, this thing is explicitly set to rw here, but we should make this configurable rather sooner than later). We can also open an issue to get input from more people. What are your thoughts on this?

gabhijit commented 3 years ago

@gabhijit Yep, it would be great. Thanks for the thorough investigation! As for adding or not rw to the default cmd line, I would vote for adding it and let the device decide how the root will be mounted (for now, this thing is explicitly set to rw here, but we should make this configurable rather sooner than later). We can also open an issue to get input from more people. What are your thoughts on this?

For vmlinux-* where there is no Ro-RootFS flag, kernel defaults to rw mount (even when we don't specify it on commandline), so I believe specifying rw as part of default command-line arguments is not semantically wrong (ie. breaking something that assumes default w/o ro/rw), Guess it's still a good idea to get input from more people. Also,implementation should take care that - if a user has specified ro flag, it overrides the default rw, if we decide to go that path.

@lauralt Related to code you referred above, I believe --block should take an explicit root param instead of just assuming the first block device passed is a root device (which is likely to be the case often, still). Thus eg. the block param may look like

--block root=/path/to/rootfs/blockdev [ro|rw],path=/path/to/another/blockdev

Default being rw if not specified by the User. Does it make sense?

lauralt commented 3 years ago

@gabhijit Makes sense, I'll open some issues for the blk cmd line configuration and other block related things. The api thing, at this point, needs a broader discussion, we were hoping for example to make things more modular at this level and have a clearer separation between the api and the vmm.

gabhijit commented 3 years ago

This is fixed inside #64. This can be closed?

lauralt commented 3 years ago

Yeah it can, I left it open as a reminder for the issues above, but we can close it as well. I'll open those issues this week.

tumbleshack commented 3 years ago

@seanmichwalsh and I encountered this problem when running the VMM for the first time.

We fixed this by granting read, write, and execute permissions to all files in the directories containing both. Eg executing sudo chmod -r 777 on the host.

gabhijit commented 3 years ago

@tumbleshack : The Host file corresponding to the root disk, needs to be user writable. Look at the following code from tests/test_run_vmm_reference.py, you could do something like this as user, rather than chmoding the file to 777. ie. keep the file owned by root as it is, create a 'new' file owned by the user.

    if disk_path is not None:
        # Terrible hack to have a rootfs owned by the user.
        with tempfile.NamedTemporaryFile(dir='/tmp', delete=True) as tmpfile:
            tmp_file_path = tmpfile.name
        cp_cmd = "cp {} {}".format(disk_path, tmp_file_path)
        subprocess.run(cp_cmd, shell=True, check=True)
        vmm_cmd.append("--block")
        vmm_cmd.append("path={}".format(tmp_file_path))
tumbleshack commented 3 years ago

Haha, of course, it does make sense that a file system would need to be writable. Thanks for the workaround.

shpark commented 3 years ago

I had similar problem (I didn't configure the block device to be read-only, but the file system is mounted read only). Here's the workaround that I made, which does not require changing file permissions from the host side :smile:

I simply changed

https://github.com/rust-vmm/vmm-reference/blob/1226002a93a4dd857864ec4bc2eed3688e9870e3/src/devices/src/virtio/block/mod.rs#L92-L93

to

https://github.com/shpark/vmm-reference/blob/97afcc3e19535268fbe2dc2d0f7bade8e351043c/src/devices/src/virtio/block/mod.rs#L92-L96