rust-vmm / vhost-device

'vhost-user' device backends workspace
Apache License 2.0
69 stars 47 forks source link

vsock: Issues in sibling VM communication #384

Open techiepriyansh opened 1 year ago

techiepriyansh commented 1 year ago

These issues were discovered while trying to test the current implementation of sibling VM communication in vhost-user-vsock. The testing was done with iperf-vsock and nc-vsock, both patched to set .svm_flags = VMADDR_FLAG_TO_HOST.

Issues

Deadlock

If you try to test the sibling communication by running iperf-vsock or transferring big files with nc-vsock, the vhost-user-vsock process hangs and becomes completely irresponsive. After a bit of debugging, I discovered that there is deadlock.

The deadlock occurs when two sibling VMs simultaneously try to send each other packets. The VhostUserVsockThreads corresponding to both the VMs hold their own locks while executing thread_backend.send_pkt and then try to lock each other to access their counterpart's raw_pkts_queue. This ultimately results in a deadlock.

In particular, this line of code unleashes the deadlock.

The deadlock can be resolved by separating the mutex over raw_pkts_queue from the mutex over VhostUserVsockThread.

Raw packets queue not being processed completely

Even after resolving the deadlock, the vhost-user-vsock process still hangs while testing, though not completely irresponsive this time. It turns out that sometimes the raw packets pending on the raw_pkts_queue are never processed, resulting in the hang.

This happens because currently, the raw_pkts_queue is processed only when a SIBLING_VM_EVENT is received. But it may happen that the raw_pkts_queue could not be processed completely due to insufficient space in the RX virtqueue at that time.

This can be resolved by trying to process raw packets on other events too similar to what happens in the RX of standard packets.

Current status

While fixing the above two issues seems to make nc-vsock run flawlessly, testing with iperf-vsock still results in the vhost-user-vsock process hanging. There might be a notification problem and could be related to the EVENT_IDX feature.

techiepriyansh commented 1 year ago

While #385 resolves the deadlock and the problem with raw packets queue not being processed completely, iperf-vsock still doesn't work. Following could be the reasons for that: