Closed deajan closed 1 year ago
Hi, I've never used it, but was looking for something like that some time ago (back then only 9p was available).
Just a guess, but maybe --announce-submunts
will help?
My reasoning is that from linux POV /backup
is a FS, /backup/dataset
is another FS mounted within the parent FS and so on...
Maybe try the simplest case first: zfs set mountpoint=/backup_test_virtio backup/dataset/mydataset
and then use /backup_test_virtio
as virtio-fs source
.
Also, check if mount
lists the dataset in question after zfs mount -a
.
Running with 2.1.11-r0-funtoo and have no problems handing ZFS backed directories into qemu VMs, both linux and Windows.
Remember that you have to mount the shared folder in the guest! Else you'll only see an empty mountpoint when looking in the guest, with the content being there when looking on the host.
What is your actual issue: the content vanishing on the host or not showing up in the guest?
So, first, thanks for the answers, and sorry for the delay in response. Here's what I did so far:
There's no --announce-submunts
option in libvirt, so I did an ugly patch:
#!/usr/bin/env bash
/usr/libexec/virtiofsd_dist $@ --announce-submounts
This way, I get to add `--announce-submounts` to virtiofsd:
root 12079 0.0 0.0 6400 2328 pts/19 S+ 18:36 0:00 grep --color=auto virtiof root 63816 0.0 0.0 5776 3892 ? S 18:23 0:00 /usr/libexec/virtiofsd --fd=98 -o source=/backup/restic_stash/,cache=always,xattr --announce-submounts root 63818 0.0 0.0 2238128 5332 ? Sl 18:23 0:00 /usr/libexec/virtiofsd --fd=98 -o source=/backup/restic_stash/,cache=always,xattr --announce-submounts
On the host machine, I created a fresh machine with following libvirt config
<filesystem type='mount' accessmode='passthrough'>
<driver type='virtiofs' queue='1024'/>
<binary path='/usr/libexec/virtiofsd' xattr='on'>
<cache mode='always'/>
</binary>
<source dir='/backup/restic_stash/'/>
<target dir='restic_stash'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x1a' function='0x0'/>
</filesystem>
Host config:
NAME USED AVAIL REFER MOUNTPOINT backup 871G 34.4T 283G /backup backup/restic_stash 73.6G 34.4T 186K /backup/restic_stash backup/restic_stash/userA 140K 34.4T 140K /backup/restic_stash/userA backup/restic_stash/userB 73.6G 34.4T 73.6G /backup/restic_stash/userB
backup on /backup type zfs (rw,noatime,seclabel,xattr,noacl) backup/restic_stash on /backup/restic_stash type zfs (rw,noatime,seclabel,noxattr,noacl) backup/restic_stash/userB on /backup/restic_stash/userB type zfs (rw,noatime,seclabel,noxattr,noacl) backup/restic_stash/userA on /backup/restic_stash/userA type zfs (rw,noatime,seclabel,noxattr,noacl)
On the VM side
restic_stash /restic_stash virtiofs defaults,noatime,nodiratime,nodev,noexec,nosuid,nofail 0 2
restic_stash on /restic_stash type virtiofs (rw,nosuid,nodev,noexec,noatime,nodiratime,seclabel)
total 512 drwx------. 2 restic root 2 Jul 7 18:18 . dr-xr-xr-x. 20 root root 270 May 31 23:42 ..
total 0 drwxr-xr-x. 2 root root 6 May 16 13:31 . dr-xr-xr-x. 20 root root 270 May 31 23:42 ..
dmesg | egrep -i "virtiofs|restic_stash" [ 9.119442] systemd-fstab-generator[474]: Checking was requested for "restic_stash", but it is not a device. [ 10.535127] virtiofs virtio0: virtio_fs_setup_dax: No cache capability
As you can see, FS is mounted (since once I unmount it, the ls output changes slightly).
Anyway, no files in there.
Strangely enough, since zfs 2.1.12-1 update, host zfs filesystem doesn't get unmounted anymore when I start the VM.
I've also redone the above tests after disabling SELinux in host and guest systems.
Lastly, I redid the test without the `--announce-submounts` hack I tried.
For all completeness, here's the full cmdline of my VM:
/usr/libexec/qemu-kvm -name guest=mymachine.local,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-115-mymachine.lo/master-key.aes"} -machine pc-q35-rhel9.2.0,usb=off,dump-guest-core=off,memory-backend=pc.ram -accel kvm -cpu Icelake-Server,ds=on,ss=on,dtes64=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,sha-ni=on,rdpid=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,mpx=off,intel-pt=off -m 2048 -object {"qom-type":"memory-backend-memfd","id":"pc.ram","share":true,"x-use-canonical-path-for-ramblock-id":false,"size":2147483648} -overcommit mem-lock=off -smp 2,sockets=2,cores=1,threads=1 -object {"qom-type":"iothread","id":"iothread1"} -object {"qom-type":"iothread","id":"iothread2"} -uuid 41c28c02-482e-4325-becc-23db4642395d -smbios type=0,vendor=npf -smbios type=1,manufacturer=NetPerfect,product=vmv3tls -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=97,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device {"driver":"pcie-root-port","port":8,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x1"} -device {"driver":"pcie-root-port","port":9,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x1.0x1"} -device {"driver":"pcie-root-port","port":10,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x1.0x2"} -device {"driver":"pcie-root-port","port":11,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x1.0x3"} -device {"driver":"pcie-root-port","port":12,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x1.0x4"} -device {"driver":"pcie-root-port","port":13,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x1.0x5"} -device {"driver":"pcie-root-port","port":14,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x1.0x6"} -device {"driver":"pcie-root-port","port":15,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x1.0x7"} -device {"driver":"pcie-root-port","port":16,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x2"} -device {"driver":"pcie-root-port","port":17,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x2.0x1"} -device {"driver":"pcie-root-port","port":18,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x2.0x2"} -device {"driver":"pcie-root-port","port":19,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x2.0x3"} -device {"driver":"pcie-root-port","port":20,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x2.0x4"} -device {"driver":"pcie-root-port","port":21,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x2.0x5"} -device {"driver":"pcie-root-port","port":22,"chassis":15,"id":"pci.15","bus":"pcie.0","addr":"0x2.0x6"} -device {"driver":"pcie-pci-bridge","id":"pci.16","bus":"pci.1","addr":"0x0"} -device {"driver":"pcie-root-port","port":23,"chassis":17,"id":"pci.17","bus":"pcie.0","addr":"0x2.0x7"} -device {"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.3","addr":"0x0"} -device {"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.4","addr":"0x0"} -blockdev {"driver":"file","filename":"/data/private_vm/mymachine.local-disk0.qcow2","aio":"native","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-2-storage","backing":null} -device {"driver":"virtio-blk-pci","iothread":"iothread2","num-queues":2,"bus":"pci.5","addr":"0x0","drive":"libvirt-2-format","id":"virtio-disk0","bootindex":1,"write-cache":"on"} -device {"driver":"ide-cd","bus":"ide.0","id":"sata0-0-0"} -chardev socket,id=chr-vu-fs0,path=/var/lib/libvirt/qemu/domain-115-mymachine.lo/fs0-fs.sock -device {"driver":"vhost-user-fs-pci","id":"fs0","chardev":"chr-vu-fs0","queue-size":1024,"tag":"restic_stash","bus":"pcie.0","addr":"0x1a"} -netdev {"type":"tap","fd":"98","vhost":true,"vhostfd":"106","id":"hostnet0"} -device {"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"52:54:00:4b:59:52","bus":"pci.2","addr":"0x0"} -chardev pty,id=charserial0 -device {"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0} -chardev socket,id=charchannel0,fd=70,server=on,wait=off -device {"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"} -audiodev {"id":"audio1","driver":"none"} -device {"driver":"i6300esb","id":"watchdog0","bus":"pci.16","addr":"0x1"} -watchdog-action reset -device {"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.6","addr":"0x0"} -object {"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"} -device {"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.7","addr":"0x0"} -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
I'm honestly puzzled.
Oh, and for what it's worth, using an xfs directory instead of a zfs one works.
I tested this on pop os vm :
ZFS layout
# zfs list -r pool-storage/app_storage
NAME USED AVAIL REFER MOUNTPOINT
pool-storage/app_storage 430K 2.18T 151K /pool-storage/app_storage
pool-storage/app_storage/subfs1 140K 2.18T 140K /pool-storage/app_storage/subfs1
pool-storage/app_storage/subfs2 140K 2.18T 140K /pool-storage/app_storage/subfs2
KVM host config:
<filesystem type="mount" accessmode="passthrough">
<driver type="virtiofs"/>
<binary path="/usr/local/bin/virtiofsd-wrapper.sh" xattr="on"/>
<source dir="/pool-storage/app_storage"/>
<target dir="app_storage"/>
<address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</filesystem>
VM guest:
mount -t virtiofs app_storage /app_storage
Initially /pool-storage/app_storage
was just a subdir on ZFS filesystem - no problem accessing files from KVM guest
Later, I created ZFS layout as described above, files on pool-storage/app_storage
could still be accessed, but subfs1
and subfs2
have problems (at least with find
):
# find /app_storage/
/app_storage/
find: File system loop detected; ‘/app_storage/subfs2’ is part of the same file system loop as ‘/app_storage/’.
find: File system loop detected; ‘/app_storage/subfs1’ is part of the same file system loop as ‘/app_storage/’.
Despite this, I can still access files located in these subfs*
directories
However, despite using --announce-submounts
(via a wrapper script, similar to Your solution) inode numbers seem to collide
I am yet to test this with other FS, but I suspect the same problems with inode numbers to occur
...after chacking mount
output on host, I noticed that I do not have seclabel
option:
# mount | grep app_storage
pool-storage/app_storage on /pool-storage/app_storage type zfs (rw,xattr,noacl)
pool-storage/app_storage/subfs1 on /pool-storage/app_storage/subfs1 type zfs (rw,xattr,noacl)
pool-storage/app_storage/subfs2 on /pool-storage/app_storage/subfs2 type zfs (rw,xattr,noacl)
perhaps seclabel
limits visibility of Your mountpoints?
Sadly, I know nothing about selinux.
@filip-paczynski Would you mind giving me the full cmdline of virtiofsd
host side so I can compare ?
@filip-paczynski Would you mind giving me the full cmdline of
virtiofsd
host side so I can compare ?
Sure:
# ps ax | grep virtiofs
131048 ? S 0:00 /bin/sh /usr/local/bin/virtiofsd-wrapper.sh --fd=37 -o source=/pool-storage/app_storage,xattr
131050 ? S 0:00 \_ /usr/lib/virtiofsd --announce-submounts --fd=37 -o source=/pool-storage/app_storage,xattr
131054 ? Sl 0:00 \_ /usr/lib/virtiofsd --announce-submounts --fd=37 -o source=/pool-storage/app_storage,xattr
The problem with inode numbers being duplicated also occurs with XFS. XFS layout on host side:
# mount | grep xfs_
/home/filip.paczynski/tmp/xfs-root on /xfs_test type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
/home/filip.paczynski/tmp/xfs-subfs1 on /xfs_test/subfs1 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
/home/filip.paczynski/tmp/xfs-subfs2 on /xfs_test/subfs2 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
Testing find
on VM:
# mount -t virtiofs xfs_test /xfs_test/
# find /xfs_test/
/xfs_test/
find: File system loop detected; ‘/xfs_test/subfs1’ is part of the same file system loop as ‘/xfs_test/’.
find: File system loop detected; ‘/xfs_test/subfs2’ is part of the same file system loop as ‘/xfs_test/’.
Well, I have no idea why, but after updating and restarting both host and guests, I got to the same results as you, without any special behavior I had before.
# find /restic_stash
/restic_stash/
/restic_stash/restic_stash_file
find: File system loop detected; ‘/restic_stash/userA’ is part of the same file system loop as ‘/restic_stash/’.
find: File system loop detected; ‘/restic_stash/userB’ is part of the same file system loop as ‘/restic_stash/’.
Since you were able to reproduce the same problem at XFS level, I guess there's no need to keep this issue open. Sorry for the noise.
Any idea where to open a new issue ?
I guess the initial issue with ZFS filesystems not being visible on VM side was related to selinux, or seclabel
option, or similar security-related feature.
This is no issue with ZFS.
The problem with duplicated inode numbers, which makes find
consider subfs*
as a hardlink to parent directory is another matter. Theoretically, such a configuration should be handled by virtiofsd --announce-submonuts
.
However, I do not fully understand what they mean by "device number", or how should this by handled on VM side:
from virtiofsd docs:
--announce-submounts solves that problem because it reports a different device number for every submount it encounters.
I have no experience with virtiofsd
and I only run KVM on my "local" machine (Xen on servers).
I guess one could ask on virtiofsd forums/discussions about --announce-submounts
and why it duplicates inode numbers on VM side.
Glad I could help.
@filip-paczynski From what I understand, --announce-submounts
only allows sync operations to be sent to each FS, in order to avoid inconsistencies when unmounting ?
Strangely enough, I didn't change anything about seclabel. But a zfs + kernel upgrade seemed to have resolved the issue, at least the zfs related one.
I've also tried to run virtiofsd
with --inode-file-handles=mandatory
option, but the find output stays the same;
Anyway, thanks for your help.
Side notes, using xattr=on
divides IOPS by 3 on my tests.
--inode-file-handles=never
does not resolve any of the above loop problems.
Not usable for me right now. Thank your for your time, and sorry for having made that noise in the ZFS issues.
May be related... I have noticed that virtiofs with caching enabled leaves open file handles, eventually resulting in "too many files open". I'm not sure if this is due to an interaction with ZFS.
Disabling caching has a significant performance hit, maybe could be remedied by using the upcoming direct_io options?
For what it's worth, I found the issue. It's not virtiofs related, but zfs related.
In my setup, when I use zfs mount mydataset
, it mounts for the current user only.
If I happen to use systemctl restart zfs-mount
, it mounts for all users, including the virtiofs bridge, which actually works fine.
Not sure whether this is a bug or a feature yet, but I've opened a discussion here
I'm trying to share some ZFS datasets with a qemu guest using virtiofs. So far, eveytime I try to launch my guest, it shows an empty folder (actually shares the mountpoint directory instead of the ZFS fs).
I had various times where it seemed like zfs would "unmount" itself once I launched my virtual guest.
After having tried the following (on a test system of course)
I can list the my zfs dataset with:
Content is also visible.
But I cannot use that dataset with virtiofs
I'm really sorry for the noise since this is probably a vitiofs bug and not a zfs one, but I do think zfs behaves strangely since it shouldn't just "unmount" my datasets when accessed via virtiofs. Do you guys have any experience with virtiofs ? Do FUSE daemons get to use zfs mountpoints properly ? Anything to configure perhaps ?
Best regards
System information
Virtiofs relevant config in virtual guest xml: