Support proxying using vsock

dorjoy03 commented 1 month ago

Summary of the PR

Hi. I have been working on adding AWS Nitro Enclave emulation support in QEMU. Nitro enclave VMs are spawned from EC2 VMs and the enclave VMs always see the parent VM as CID 3 and the applications in the enclave VM talk to the parent VM using AF_VSOCK to CID 3. More details about this are in the cover letter for the patch posted in qemu-devel.

From the QEMU side, it would be cumbersome to run a separate VM with CID 3 with the necessary applications while running a nitro-enclave VM. So there was some discussion in the virtualization mailing list about how we could make the UX simpler for users without requiring to run a separate VM with CID 3 and instead run the applications of the parent EC2 VM in the host machine. At the end, it was suggested by Stefano that we could use vhost-user-vsock for vsock emulation in QEMU for nitro-enclave and then in vhost-device-vsock we could add support for forwarding the vsock packets to the host machine. The discussion can be found in this thread.

This PR adds 2 new options to vhost-device-vsock: forward-cid and vsock-listen.

forward-cid: When this option is given (usually with the value 1), all connections from the guest VM will be forwarded to this CID to the same ports regardless of their destination CID.

forward-listen: This is a string of port numbers separated by '+'. The vhost-device-vsock will be listening to these ports in the host machine and the connections will be forwarded to the guest VM.

For testing, first run the vhost-device-vsock from this branch:

./target/release/vhost-device-vsock --vm guest-cid=4,forward-cid=1,forward-listen=9001,socket=/tmp/vhost4.socket

Then run a VM:

./qemu-system-x86_64 -machine q35,memory-backend=mem0 -enable-kvm -m 8G -nic user,model=virtio -drive file=fedora40.qcow2,media=disk,if=virtio --display sdl -object memory-backend-memfd,id=mem0,size=8G -chardev socket,id=char0,reconnect=0,path=/tmp/vhost4.socket -device vhost-user-vsock-pci,chardev=char0

Now you can run a AF_VSOCK server in the host machine and a client inside the VM that connects to the same port but any CID (e.g., CID 3) and verify that the server and client communication works. You can run a server inside the VM listening to port 9001 and run a client in host that connects to CID 1 port 9001 and verify that the communication works.

Requirements

Before submitting your PR, please make sure you addressed the following requirements:

[x] All commits in this PR have Signed-Off-By trailers (with git commit -s), and the commit message has max 60 characters for the summary and max 75 characters for each description line.
[x] All added/changed functionality has a corresponding unit/integration test.
[x] All added/changed public-facing functionality has entries in the "Upcoming Release" section of CHANGELOG.md (if no such section exists, please create one).
[ ] Any newly added unsafe code is properly documented.

dorjoy03 commented 1 month ago

I see a few checks have failed with the same 4 tests that I added which require connecting/listening using AF_VSOCK with error "Operation not permitted" etc. The tests pass when I run them in a debian docker container. So I am guessing something needs to be updated about the VMs that run these tests.

stefano-garzarella commented 1 week ago

I see a few checks have failed with the same 4 tests that I added which require connecting/listening using AF_VSOCK with error "Operation not permitted" etc. The tests pass when I run them in a debian docker container. So I am guessing something needs to be updated about the VMs that run these tests.

@andreeaflorescu do you know if our CI runners has vsock_loopback kernel module available?

It should be automatically loaded when opening an AF_VSOCK socket, and it provides the CID 1 handling, so it may be the issue if it's not available, since we can't bind CID 1 IIUC.

stefano-garzarella commented 1 week ago

I see a few checks have failed with the same 4 tests that I added which require connecting/listening using AF_VSOCK with error "Operation not permitted" etc. The tests pass when I run them in a debian docker container. So I am guessing something needs to be updated about the VMs that run these tests.

@andreeaflorescu do you know if our CI runners has vsock_loopback kernel module available?

It should be automatically loaded when opening an AF_VSOCK socket, and it provides the CID 1 handling, so it may be the issue if it's not available, since we can't bind CID 1 IIUC.

Indeed on my laptop tests are working and I have this behaviour:

$ lsmod | grep vsock_loopback

$ cargo test -p vhost-device-vsock
...
test result: ok. 38 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s

$ lsmod | grep vsock_loopback
vsock_loopback         12288  0
vmw_vsock_virtio_transport_common    57344  1 vsock_loopback
vsock                  61440  3 vmw_vsock_virtio_transport_common,vsock_loopback,vmw_vsock_vmci_transport

If we can't enabled that kernel module, maybe we should just skip the test if we can't bind CID 1.

dorjoy03 commented 1 week ago

@stefano-garzarella I have now addressed some of the comments but I needed some input or had some questions about the others. I have not yet updated README and changelog but will do so after I get through all the reviews at the end. Can you please do another round of review and help me with the other reviews? Thanks!

dorjoy03 commented 1 week ago

I have now gone through the reviews and readme and changelog is updated as well. Thanks for the reviews @stefano-garzarella

Interestingly, the 'coverage-x86_64' run started failing but it didn't fail last time so to check if it was my latest push that did something, I reset the branch to the old commit (with a whitespace change to trigger the CI) and the 'coverage-x86_64' failed again which didn't fail before. So I am not sure why the new failure occurred. Did anything change on the CI side?

stefano-garzarella commented 1 week ago

I have now gone through the reviews and readme and changelog is updated as well. Thanks for the reviews @stefano-garzarella

Interestingly, the 'coverage-x86_64' run started failing but it didn't fail last time so to check if it was my latest push that did something, I reset the branch to the old commit (with a whitespace change to trigger the CI) and the 'coverage-x86_64' failed again which didn't fail before. So I am not sure why the new failure occurred. Did anything change on the CI side?

We updated the CI, but we shouldn't touch anything about coverage, so this could be related to the test failure, since coverage runs the test, so if some tests are failing, I think also the coverage can't be calculated.

What about adding a third commit where we skip the new vsock tests if we can't create AF_VSOCK sockets? I suggest a separate commit, so we can revert it when we fix our CI.

dorjoy03 commented 1 week ago

We updated the CI, but we shouldn't touch anything about coverage, so this could be related to the test failure, since coverage runs the test, so if some tests are failing, I think also the coverage can't be calculated.

What about adding a third commit where we skip the new vsock tests if we can't create AF_VSOCK sockets?

Yeah I can look into skipping the tests but did you get a chance to look into fixing the CI so that we are able to run them? Running them in a standard debian docker container works fine for me.

stefano-garzarella commented 1 week ago

We updated the CI, but we shouldn't touch anything about coverage, so this could be related to the test failure, since coverage runs the test, so if some tests are failing, I think also the coverage can't be calculated. What about adding a third commit where we skip the new vsock tests if we can't create AF_VSOCK sockets?

Yeah I can look into skipping the tests but did you get a chance to look into fixing the CI so that we are able to run them? Running them in a standard debian docker container works fine for me.

It depends on the host kernel as I mentioned here: https://github.com/rust-vmm/vhost-device/pull/706#issuecomment-2321312778

Our CI runners are provided by AWS, so we need some input from them. @roypat @andreeaflorescu do you have any advice?

dorjoy03 commented 1 week ago

@stefano-garzarella I have now introduced the "proxy" feature and made it default and "ignore"d the 4 tests that were failing in the CI. A lot of changes so please re-review all the commits.

I think in the CI for now we should make a change to run both "cargo build --release --no-default-features" "cargo test --no-default-features" and "cargo build --release" "cargo test" i.e., run without the feature and with the feature so that we cover everything. Please run locally with the feature turned off and on to verify that I haven't broken anything. Thanks!

stefano-garzarella commented 1 week ago

@stefano-garzarella I have now introduced the "proxy" feature and made it default and "ignore"d the 4 tests that were failing in the CI. A lot of changes so please re-review all the commits.

Done, thanks!

I think in the CI for now we should make a change to run both "cargo build --release --no-default-features" "cargo test --no-default-features" and "cargo build --release" "cargo test" i.e., run without the feature and with the feature so that we cover everything. Please run locally with the feature turned off and on to verify that I haven't broken anything. Thanks!

Yeah, we have an issue opened in rust-vmm-ci: https://github.com/rust-vmm/rust-vmm-ci/issues/152 If you have time/desire, feel free to take a look at it.

roypat commented 1 week ago

We updated the CI, but we shouldn't touch anything about coverage, so this could be related to the test failure, since coverage runs the test, so if some tests are failing, I think also the coverage can't be calculated. What about adding a third commit where we skip the new vsock tests if we can't create AF_VSOCK sockets?

Yeah I can look into skipping the tests but did you get a chance to look into fixing the CI so that we are able to run them? Running them in a standard debian docker container works fine for me.

It depends on the host kernel as I mentioned here: #706 (comment)

Our CI runners are provided by AWS, so we need some input from them. @roypat @andreeaflorescu do you have any advice?

Mh, I just connected to one of our CI runners, and it seems the module is available:

$ find /lib/modules/$(uname -r) -type f -name '*.ko' | grep vsock
/lib/modules/6.5.0-1024-aws/kernel/net/vmw_vsock/vmw_vsock_vmci_transport.ko
/lib/modules/6.5.0-1024-aws/kernel/net/vmw_vsock/vmw_vsock_virtio_transport_common.ko
/lib/modules/6.5.0-1024-aws/kernel/net/vmw_vsock/vsock_loopback.ko                       <------
/lib/modules/6.5.0-1024-aws/kernel/net/vmw_vsock/vmw_vsock_virtio_transport.ko
/lib/modules/6.5.0-1024-aws/kernel/net/vmw_vsock/hv_sock.ko
/lib/modules/6.5.0-1024-aws/kernel/net/vmw_vsock/vsock.ko
/lib/modules/6.5.0-1024-aws/kernel/net/vmw_vsock/vsock_diag.ko
/lib/modules/6.5.0-1024-aws/kernel/drivers/vhost/vhost_vsock.ko

maybe it's something to do with docker? :thinking:

stefano-garzarella commented 1 week ago

Mh, I just connected to one of our CI runners, and it seems the module is available:

@roypat thanks for checking!

$ find /lib/modules/$(uname -r) -type f -name '*.ko' | grep vsock
/lib/modules/6.5.0-1024-aws/kernel/net/vmw_vsock/vmw_vsock_vmci_transport.ko
/lib/modules/6.5.0-1024-aws/kernel/net/vmw_vsock/vmw_vsock_virtio_transport_common.ko
/lib/modules/6.5.0-1024-aws/kernel/net/vmw_vsock/vsock_loopback.ko                       <------
/lib/modules/6.5.0-1024-aws/kernel/net/vmw_vsock/vmw_vsock_virtio_transport.ko
/lib/modules/6.5.0-1024-aws/kernel/net/vmw_vsock/hv_sock.ko
/lib/modules/6.5.0-1024-aws/kernel/net/vmw_vsock/vsock.ko
/lib/modules/6.5.0-1024-aws/kernel/net/vmw_vsock/vsock_diag.ko
/lib/modules/6.5.0-1024-aws/kernel/drivers/vhost/vhost_vsock.ko

maybe it's something to do with docker? 🤔

yeah, it could be.

Can we make sure they are loaded? Theoretically if they're not loaded, the first time an AF_VSOCK socket is created, the kernel loads af_vsock and vsock_loopback automatically, but it's possible that being inside a docker, this is not allowed.

One way to test it, is the following:

# start the container
# run this in the container
python3 -c "import socket; s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM); s.bind((1, socket.VMADDR_PORT_ANY))"

dorjoy03 commented 1 week ago

Thanks. Good stuff!

roypat commented 4 days ago

python3 -c "import socket; s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM); s.bind((1, socket.VMADDR_PORT_ANY))"

Sorry for the delay, finally got around to testing this. I got the following result:

$ lsmod | grep vsock
$ sudo docker run --device=/dev/kvm -it --rm --volume $(pwd):/workdir --workdir /workdir --privileged rustvmm/dev:v44
root@05334d5ffd14:/workdir# python3 -c "import socket; s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM); s.bind((1, socket.VMADDR_PORT_ANY))"
$ lsmod | grep vsock
vsock_loopback         12288  0
vmw_vsock_virtio_transport_common    61440  1 vsock_loopback
vmw_vsock_vmci_transport    40960  0
vmw_vmci              106496  1 vmw_vsock_vmci_transport
vsock                  69632  3 vmw_vsock_virtio_transport_common,vsock_loopback,vmw_vsock_vmci_transport
ssm-user@ip-172-31-68-60:/var/snap/amazon-ssm-agent/7994$

So it does seem to get loaded even from the docker container.

stefano-garzarella commented 4 days ago

@roypat thanks for checking.

Looking at the commit where the CI failed, the docker container was started in this way:

docker run -t -i --rm --init --volume /var/lib/buildkite-agent/builds/ip-172-31-66-96-2/rust-vmm/vhost-device-ci:/workdir --workdir /workdir --label com.buildkite.job-id=01913daa-2e9d-4bf3-808b-eaf6aca7f0ff rustvmm/dev:v35 /bin/sh -e -c cargo\ test\ --all-features\ --workspace

So I'm not sure if for example --privileged allowed the AF_VSOCK socket to be created in your test, while it failed in the pipeline.

roypat commented 4 days ago

Yes, that sounds quite plausible. I'll make a note to look into having the CI runners just load the vsock module on startup! (although might not get to it this or next week :( )

stefano-garzarella commented 4 days ago

Yes, that sounds quite plausible. I'll make a note to look into having the CI runners just load the vsock module on startup! (although might not get to it this or next week :( )

Great! BTW no rush at all, we already merged this PR, and we skipped the tests if we can't use AF_VSOCK.

rust-vmm / vhost-device

Support proxying using vsock #706

Summary of the PR

Requirements