rust-vmm / vhost

Apache License 2.0
130 stars 69 forks source link

vhost: Migration of vhost_user device using ovs_dpdk as backend fails using Cloud Hypervisor on Ubuntu 22.04 host #217

Open rveerama1 opened 10 months ago

rveerama1 commented 10 months ago

We have noticed an issue in vhost while we migrating VMs using Cloud Hypervisor. We are attempting to migrate Docker container from Ubuntu 20.04 to Ubuntu 22.04 and noticed this issue. Details about our attempts are here https://github.com/cloud-hypervisor/cloud-hypervisor/pull/5877 .

This particular test from https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/tests/integration.rs#L9616 live_migration::live_migration_sequential::test_live_migration_ovs_dpdk test has been stuck while running on Ubuntu 22.04 host.

On further debugging it has been struck while sending request SET_LOG_BASE to the backend at https://github.com/rust-vmm/vhost/blob/main/vhost/src/vhost_user/connection.rs#L538. It never returns from here (neither error or success).

Screenshot from 2023-11-29 12-27-58

Attached backtrace for further details.

Steps to reproduce this steps.

Install openvswitch-switch-dpdk and 

$  sudo modprobe openvswitch
$  echo 6144 | sudo tee /proc/sys/vm/nr_hugepages
$  sudo service openvswitch-switch start
$  sudo ovs-vsctl init
$  sudo ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
$  sudo service openvswitch-switch restart
$  sudo ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
$  sudo ovs-vsctl add-port ovsbr0 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuserclient options:vhost-server-path=/tmp/dpdkvhostclient2
$  sudo ip link set up dev ovsbr0
$  sudo service openvswitch-switch restart

Clone Cloud Hypervisor and build

$ git clone git@github.com:cloud-hypervisor/cloud-hypervisor.git
$ cd cloud-hypervisor
$ cargo build --debug
$ sudo setcap cap_net_admin+ep ./target/debug/cloud-hypervisor

Build custom kernel from here : https://github.com/cloud-hypervisor/cloud-hypervisor/tree/main#custom-kernel-and-disk-image Get the Guest image from here : https://cloud-hypervisor.azureedge.net/focal-server-cloudimg-amd64-custom-20210609-0.qcow2 and follow the steps : https://github.com/cloud-hypervisor/cloud-hypervisor/tree/main#disk-image

Once you have custom kernel and guest image, run below command from different terminals

$ target/debug/cloud-hypervisor --api-socket /tmp/cloud-hypervisor.sock --cpus boot=2 --memory size=0,shared=on --memory-zone id=mem0,size=1G,shared=on,host_numa_node=0 --kernel <path to custom kernel>/vmlinux --cmdline "root=/dev/vda1 console=hvc0 rw systemd.journald.forward_to_console=1" --disk path=<path to focal>/focal-server-cloudimg-amd64-custom-20210609-0.raw path=/tmp/ubuntu-cloudinit.img --net tap=,mac=12:34:56:78:90:01,ip=192.168.1.1,mask=255.255.255.0 vhost_user=true,socket=/tmp/dpdkvhostclient2,num_queues=2,queue_size=256,vhost_mode=server -v

$ target/debug/cloud-hypervisor --api-socket /tmp/cloud-hypervisor.sock.dest -v

$ target/debug/ch-remote --api-socket /tmp/cloud-hypervisor.sock.dest receive-migration unix:/tmp/live-migration.sock

$ target/debug/ch-remote --api-socket /tmp/cloud-hypervisor.sock send-migration unix:/tmp/live-migration.sock
Ablu commented 10 months ago

The backtrace seems to indicate that the frontend waits for an ACK from the backend. Is the SET_LOG_MESSAGE acked by the backend here?

The reproduction steps are fairly involved... Do you have a link to the code of the backend that you pair this with?

rveerama1 commented 9 months ago

@Ablu Thanks for the reply.

Do you have a link to the code of the backend that you pair this with?

https://github.com/rust-vmm/vhost/blob/main/vhost/src/vhost_user/connection.rs#L538 do you mean this part or something else?

The reproduction steps are fairly involved

There is another simple method to reproduce this scenario is

Clone cloud hypervisor
$ git clone git@github.com:cloud-hypervisor/cloud-hypervisor.git
$ cd cloud-hypervisor

apply my PR patch
https://patch-diff.githubusercontent.com/raw/cloud-hypervisor/cloud-hypervisor/pull/5877.patch

and run this particular test
$ scripts/dev_cli.sh  tests --integration-live-migration -- --test-filter test_live_migration_ovs_dpdk

This test will run inside the container. I don't know adding gdb there. If you would like to run it on host all those steps are involved.

rveerama1 commented 8 months ago

@Ablu

I am trying to debug it further.

It fails/blocks at self.sock.recv_with_fds(iovs, &mut fd_array) :

https://github.com/rust-vmm/vhost/blob/06694741b8c1c0b583eb7b3d75a7c746da9588ab/vhost/src/vhost_user/connection.rs#L328-L329

I went further and it fails/blocks at socket recvmsg:

https://github.com/rust-vmm/vmm-sys-util/blob/0e10ca98b55797a64319d746d94379f7cdf81d02/src/linux/sock_ctrl_msg.rs#L221-L222

Do you have a link to the code of the backend that you pair this with?

I couldn't figure out from ovs-dpdk or dpdk source code exact location. ovs-dpdk is a package installed from Ubuntu 22.04. They don't have an issues tab https://github.com/openvswitch/ovs, but I sent a mail to bugs@openvswitch.org . Let's see.

Ablu commented 8 months ago

@rveerama1: Yeah, as I said before, it looks like we expect an ACK from the backend does not arrive. It would be interesting to know if the SET_LOG_MESSAGE is acked by the backend here.

rveerama1 commented 8 months ago

It would be interesting to know if the SET_LOG_MESSAGE is acked by the backend here.

do you mean reply to this call?

https://github.com/rust-vmm/vhost/blob/06694741b8c1c0b583eb7b3d75a7c746da9588ab/vhost/src/vhost_user/frontend.rs#L215

rveerama1 commented 8 months ago

logs from /var/log/openvswitch/ovs-vswitchd.log

on host Ubuntu 22.04

2024-01-16T16:01:25.143Z|00033|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_LOG_BASE
2024-01-16T16:01:25.143Z|00034|dpdk|INFO|VHOST_CONFIG: log mmap size: 32768, offset: 0

This is where vhost doesn't get ACK 
on host Ubuntu 20.04

2024-01-16T14:17:03.687Z|00033|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_LOG_BASE
2024-01-16T14:17:03.688Z|00034|dpdk|INFO|VHOST_CONFIG: log mmap size: 32768, offset: 0
2024-01-16T14:17:03.688Z|00035|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
2024-01-16T14:17:03.688Z|00036|dpdk|INFO|VHOST_CONFIG: negotiated Virtio features: 0x16442a782
2024-01-16T14:17:03.688Z|00037|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
2024-01-16T14:17:03.688Z|00038|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
2024-01-16T14:17:04.767Z|00039|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_LOG_BASE
2024-01-16T14:17:04.767Z|00040|dpdk|INFO|VHOST_CONFIG: log mmap size: 32768, offset: 0
2024-01-16T14:17:04.776Z|00041|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_LOG_BASE
2024-01-16T14:17:04.776Z|00042|dpdk|INFO|VHOST_CONFIG: log mmap size: 32768, offset: 0
2024-01-16T14:17:04.783Z|00043|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
2024-01-16T14:17:04.783Z|00044|dpdk|INFO|VHOST_CONFIG: set queue enable: 0 to qp idx: 0
2024-01-16T14:17:04.783Z|00045|netdev_dpdk|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/tmp/dpdkvhostclient2' changed to 'disabled'
2024-01-16T14:17:04.783Z|00046|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
2024-01-16T14:17:04.783Z|00047|dpdk|INFO|VHOST_CONFIG: set queue enable: 0 to qp idx: 1
2024-01-16T14:17:04.784Z|00048|netdev_dpdk|INFO|State of queue 1 ( rx_qid 0 ) of vhost device '/tmp/dpdkvhostclient2' changed to 'disabled'
2024-01-16T14:17:04.789Z|00049|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_LOG_BASE
2024-01-16T14:17:04.789Z|00050|dpdk|INFO|VHOST_CONFIG: log mmap size: 32768, offset: 0
2024-01-16T14:17:04.797Z|00051|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
2024-01-16T14:17:04.797Z|00052|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
2024-01-16T14:17:04.797Z|00053|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
2024-01-16T14:17:04.797Z|00054|dpdk|INFO|VHOST_CONFIG: negotiated Virtio features: 0x16042a782
Ablu commented 8 months ago

do you mean reply to this call?

Yes, that sends a SET_LOG_BASE message and then waits for the ACK (at: https://github.com/rust-vmm/vhost/blob/06694741b8c1c0b583eb7b3d75a7c746da9588ab/vhost/src/vhost_user/frontend.rs#L232). That's ACK (seemingly) never arrives, so the process blocks there forever.

You will probably want to double check that our behavior is correct according to the vhost-user spec, then try to find out if your backend does not actually send the ACK or whether it gets lost somewhere.

stefano-garzarella commented 8 months ago

You will probably want to double check that our behavior is correct according to the vhost-user spec

https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#communication

VHOST_USER_SET_LOG_BASE is in the list of those that require it, but only when VHOST_USER_PROTOCOL_F_LOG_SHMFD is negotiated.

Looking at our frontend (and also the one in QEMU) it seems to require the ack in that condition. Can you check that the backend does the same? (i.e. it sends the ack when F_LOG_SHMFD is negotiated)

rveerama1 commented 8 months ago

@Ablu @stefano-garzarella thanks for the replies. I will check and get back to you.