Closed ikicha closed 1 year ago
Yep, @slp saw something similar and also #384 could be related.
It looks like we don't have this problem using in-kernel vhost-vsock
device. Can you confirm this?
Anyway, since the device is fully emulated here, IMHO it is expected that it will happen with any VMM.
I ran qemu with -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3
(it uses in-kernel vhost-vsock, right?), and a similar bug happened.
I ran qemu with
-device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3
(it uses in-kernel vhost-vsock, right?),
right!
and a similar bug happened.
mmm, so it could be a problem in the driver then.
Which host & guest kernel are you using?
for host, I tested with the latest one, 6.5? and for guest, I used 5.4(based on ubuntu 20.04)
both guest->host and host->guest have the problem.
@ikicha thanks, replicated!
We need to check better what is going on with credits and socket buffer. Could it be that ncat does not set the socket buffer properly and waits for the whole message instead of reading it in pieces?
I suspected credits.. because there is no response of CREDIT REQUEST. QQ: what components negotiate credit with each others? vhost-vsock and vhost-device-vsock? or vmm and vhost-device-vsock?
I suspected credits.. because there is no response of CREDIT REQUEST.
I can't see a CREDIT REQUEST using vhost-vsock
but could be related, I agree.
QQ: what components negotiate credit with each others? vhost-vsock and vhost-device-vsock? or vmm and vhost-device-vsock?
The VMM is not involved at all in the data path with both vhost-vsock
and vhost-device-vsock
. It is involved only for the setup and to handle the event
queue used to send event to the guest, mostly about for live migration (reset all connections).
So the credit information is exchanged between the driver and the device, usually in every packet. So, if there is traffic, they don't need to explicitly send a CREDIT REQUEST, since every packet header contains the credit info.
@ikicha can you try increasing the vsock buffer size in this way:
client.setsockopt(socket.AF_VSOCK, socket.SO_VM_SOCKETS_BUFFER_MAX_SIZE, 2*1024*1024)
client.setsockopt(socket.AF_VSOCK, socket.SO_VM_SOCKETS_BUFFER_SIZE, 2*1024*1024)
For vhost-device-vsock
we have the tx-buffer-size=
parameter that you can use for the same purpose.
In the guest, I don't know if ncat
is setting it properly, so I used this:
import socket
import os
server = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
server.bind((socket.VMADDR_CID_ANY, 1234))
server.listen()
client, _ = server.accept()
client.setsockopt(socket.AF_VSOCK, socket.SO_VM_SOCKETS_BUFFER_MAX_SIZE, 2*1024*1024)
client.setsockopt(socket.AF_VSOCK, socket.SO_VM_SOCKETS_BUFFER_SIZE, 2*1024*1024)
i = 0
while True:
print(i, ": waiting response")
response = client.recv(1024*1024)
print(i, ": received ", len(response), "bytes")
if not response:
break
print(i, ": sending ", len(response), "bytes")
client.sendall(response)
i += 1
client.close()
I tested it with vhost-vsock, and it mitigates the symptom, but it happens again as packet size increases. The interesting thing is both host and guest is stuck at client.send
(or client.sendall
) block.
If you don't want to increase the buffer size, what you need to do is have another thread reading.
Otherwise what happens here is the following:
The host wants to send 1 MB, the guest sends the data back, but when it gets to 256 K (default vsock buffer size) it deadlocks because the host buffer is full since no one is reading data from the socket.
So the host should have another thread reading as you send. This works fine if the code I sent here runs in the guest (even without increasing the buffer size), while with ncat it doesn't work, I think there is some such thing in ncat too
I tested it with vhost-vsock, and it mitigates the symptom, but it happens again as packet size increases. The interesting thing is both host and guest is stuck at
client.send
(orclient.sendall
) block.
You need to set the buffer accordingly if you don't read from it with another thread.
I just tried these 2 scripts with vhost-vsock
, without increasing the vscok buffer, but just adding the reader thread in the host.
They are running for 2 minutes without stalling.
import threading
import socket
import os
class thread(threading.Thread): def init(self, client): threading.Thread.init(self) self.client = client
def run(self): i = 0 while True: print(i, ": waiting response") response = self.client.recv(1024*1024) if not response: break print(i, ": received ", len(response), "bytes") i += 1
client = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM) client.connect((3, 1234))
message = 'connect 1234\n' client.sendall(message.encode()) response = client.recv(1024) print("response:", response)
thread = thread(client) thread.start()
i = 0 while True: message = ' '10241024*10 print(i, ": sending ", len(message), "bytes") client.sendall(message.encode()) i += 1
client.close() thread.join()
- guest
```python
import socket
import os
server = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
server.bind((socket.VMADDR_CID_ANY, 1234))
server.listen()
client, _ = server.accept()
i = 0
while True:
print(i, ": waiting response")
response = client.recv(1024*1024)
print(i, ": received ", len(response), "bytes")
if not response:
break
print(i, ": sending ", len(response), "bytes")
client.sendall(response)
i += 1
client.close()
Thanks for sharing, I confirm that it works for vhost-vsock. But it still doesn't work with vhost-user-vsock. do we need a kind of separate threads for tx and rx in vhost-user-vsock?(like proposal #323 ?)
Thanks for sharing, I confirm that it works for vhost-vsock.
Thanks for confirming!
But it still doesn't work with vhost-user-vsock. do we need a kind of separate threads for tx and rx in vhost-user-vsock?(like proposal #323 ?)
Yes, perhaps that might be the case. As a workaround, have you tried increasing --tx-buffer-size
parameter of vhost-device-vsock
?
Increasing tx buf size just postpones the error. In guest, the program hung up at client.sendall(response)
. So I just tried add timeout before client.sendall(response) to check hypothesis about vsock buf, and it works.
I think vhost-device-vsock needs a kind of flow control logic, and I was curious if vhost-vsock has something like that. (because it doesn't happened in vhost-vosck..)
(BTW, the counter part of vhost-device-vsock in vhost-vsock is drivers/vhost/vsock.c, isn't it?)
Increasing tx buf size just postpones the error. In guest, the program hung up at
client.sendall(response)
. So I just tried add timeout before client.sendall(response) to check hypothesis about vsock buf, and it works.I think vhost-device-vsock needs a kind of flow control logic, and I was curious if vhost-vsock has something like that. (because it doesn't happened in vhost-vosck..)
Yeah, we need to check better that part. The main difference is that in vhost-device-vsock, we also have the unix socket and so I think there is the problem, between the unix socket and the virtio device. If you have time, it would be great to be able to solve it ;-)
(BTW, the counter part of vhost-device-vsock in vhost-vsock is drivers/vhost/vsock.c, isn't it?)
Right.
@ikicha ah, maybe increasing the vsock buffer in guest.py in https://github.com/rust-vmm/vhost-device/issues/494#issuecomment-1777267908 will help, since I guess the problem in vhost-device-vsock could be similar to the problem in the python script of https://github.com/rust-vmm/vhost-device/issues/494#issue-1957376775
@ikicha FYI there was a potential un-init field in virtio-vsock common code (host-guest) affecting buf_alloc
and fwd_cnt
, so this could also be related to this issue.
Introduced in Linux v6.3.
Thanks for sharing! It might a part of problem, but I think I found the root cause.
There is silent return via error in epoll_register between reading from uds and writing it into vq in conn.recv_pkt, it causes packet drop.
I'll send a pull request about that soon.
guest side:
while true; do ncat -e /bin/cat --vsock -l 1234 -k; done;
host side:When I did stress test for vhost-device-vsock, I found a bug with large rx packet. It was stuck after VSOCK_OP_CREDIT_REQUEST in VsockConnection.recv_pkt(), (and it looks like there is no VSOCK_OP_CREDIT_UPDATE in tx) And it happens both in qemu and crosvm. I was curious if similar bugs happened before. (BTW, I found https://lore.kernel.org/netdev/e60fb580-01ea-baf6-3635-8fc8933f817f@sberdevices.ru/ which is similar)