[vsock] tx/rx hungup after large packet is delivered

ikicha commented 11 months ago

guest side: while true; do ncat -e /bin/cat --vsock -l 1234 -k; done; host side:

import socket
import os

socket_path = '/tmp/vm3.vsock'
client = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
client.connect(socket_path)
message = 'connect 1234\n'
client.sendall(message.encode())
response = client.recv(1024)
print(response)
i = 0
while True:
  message = ' '*1024*1024
  client.sendall(message.encode())
  response = client.recv(1024*1024)
  print(response, i)
  i += 1

client.close()

When I did stress test for vhost-device-vsock, I found a bug with large rx packet. It was stuck after VSOCK_OP_CREDIT_REQUEST in VsockConnection.recv_pkt(), (and it looks like there is no VSOCK_OP_CREDIT_UPDATE in tx) And it happens both in qemu and crosvm. I was curious if similar bugs happened before. (BTW, I found https://lore.kernel.org/netdev/e60fb580-01ea-baf6-3635-8fc8933f817f@sberdevices.ru/ which is similar)

stefano-garzarella commented 11 months ago

Yep, @slp saw something similar and also #384 could be related.

It looks like we don't have this problem using in-kernel vhost-vsock device. Can you confirm this?

Anyway, since the device is fully emulated here, IMHO it is expected that it will happen with any VMM.

ikicha commented 11 months ago

I ran qemu with -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3(it uses in-kernel vhost-vsock, right?), and a similar bug happened.

stefano-garzarella commented 11 months ago

I ran qemu with -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3(it uses in-kernel vhost-vsock, right?),

right!

and a similar bug happened.

mmm, so it could be a problem in the driver then.

Which host & guest kernel are you using?

ikicha commented 11 months ago

for host, I tested with the latest one, 6.5? and for guest, I used 5.4(based on ubuntu 20.04)

ikicha commented 11 months ago

both guest->host and host->guest have the problem.

stefano-garzarella commented 11 months ago

@ikicha thanks, replicated!

We need to check better what is going on with credits and socket buffer. Could it be that ncat does not set the socket buffer properly and waits for the whole message instead of reading it in pieces?

ikicha commented 11 months ago

I suspected credits.. because there is no response of CREDIT REQUEST. QQ: what components negotiate credit with each others? vhost-vsock and vhost-device-vsock? or vmm and vhost-device-vsock?

stefano-garzarella commented 11 months ago

I suspected credits.. because there is no response of CREDIT REQUEST.

I can't see a CREDIT REQUEST using vhost-vsock but could be related, I agree.

QQ: what components negotiate credit with each others? vhost-vsock and vhost-device-vsock? or vmm and vhost-device-vsock?

The VMM is not involved at all in the data path with both vhost-vsock and vhost-device-vsock. It is involved only for the setup and to handle the event queue used to send event to the guest, mostly about for live migration (reset all connections).

So the credit information is exchanged between the driver and the device, usually in every packet. So, if there is traffic, they don't need to explicitly send a CREDIT REQUEST, since every packet header contains the credit info.

stefano-garzarella commented 11 months ago

@ikicha can you try increasing the vsock buffer size in this way:

client.setsockopt(socket.AF_VSOCK, socket.SO_VM_SOCKETS_BUFFER_MAX_SIZE, 2*1024*1024)
client.setsockopt(socket.AF_VSOCK, socket.SO_VM_SOCKETS_BUFFER_SIZE, 2*1024*1024)

stefano-garzarella commented 11 months ago

For vhost-device-vsock we have the tx-buffer-size= parameter that you can use for the same purpose.

stefano-garzarella commented 11 months ago

In the guest, I don't know if ncat is setting it properly, so I used this:

import socket
import os

server = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
server.bind((socket.VMADDR_CID_ANY, 1234))
server.listen()

client, _ = server.accept()

client.setsockopt(socket.AF_VSOCK, socket.SO_VM_SOCKETS_BUFFER_MAX_SIZE, 2*1024*1024)
client.setsockopt(socket.AF_VSOCK, socket.SO_VM_SOCKETS_BUFFER_SIZE, 2*1024*1024)

i = 0
while True:
  print(i, ": waiting response")
  response = client.recv(1024*1024)
  print(i, ": received ", len(response), "bytes")
  if not response:
    break
  print(i, ": sending ", len(response), "bytes")
  client.sendall(response)
  i += 1

client.close()

ikicha commented 11 months ago

I tested it with vhost-vsock, and it mitigates the symptom, but it happens again as packet size increases. The interesting thing is both host and guest is stuck at client.send(or client.sendall) block.

stefano-garzarella commented 11 months ago

If you don't want to increase the buffer size, what you need to do is have another thread reading.

Otherwise what happens here is the following:

The host wants to send 1 MB, the guest sends the data back, but when it gets to 256 K (default vsock buffer size) it deadlocks because the host buffer is full since no one is reading data from the socket.

So the host should have another thread reading as you send. This works fine if the code I sent here runs in the guest (even without increasing the buffer size), while with ncat it doesn't work, I think there is some such thing in ncat too

stefano-garzarella commented 11 months ago

I tested it with vhost-vsock, and it mitigates the symptom, but it happens again as packet size increases. The interesting thing is both host and guest is stuck at client.send(or client.sendall) block.

You need to set the buffer accordingly if you don't read from it with another thread.

stefano-garzarella commented 11 months ago

I just tried these 2 scripts with vhost-vsock, without increasing the vscok buffer, but just adding the reader thread in the host. They are running for 2 minutes without stalling.

host


import threading
import socket
import os

class thread(threading.Thread): def init(self, client): threading.Thread.init(self) self.client = client

def run(self): i = 0 while True: print(i, ": waiting response") response = self.client.recv(1024*1024) if not response: break print(i, ": received ", len(response), "bytes") i += 1

client = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM) client.connect((3, 1234))

message = 'connect 1234\n' client.sendall(message.encode()) response = client.recv(1024) print("response:", response)

thread = thread(client) thread.start()

i = 0 while True: message = ' '10241024*10 print(i, ": sending ", len(message), "bytes") client.sendall(message.encode()) i += 1

client.close() thread.join()


- guest
```python
import socket
import os

server = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
server.bind((socket.VMADDR_CID_ANY, 1234))
server.listen()

client, _ = server.accept()

i = 0
while True:
  print(i, ": waiting response")
  response = client.recv(1024*1024)
  print(i, ": received ", len(response), "bytes")
  if not response:
    break
  print(i, ": sending ", len(response), "bytes")
  client.sendall(response)
  i += 1

client.close()

ikicha commented 11 months ago

Thanks for sharing, I confirm that it works for vhost-vsock. But it still doesn't work with vhost-user-vsock. do we need a kind of separate threads for tx and rx in vhost-user-vsock?(like proposal #323 ?)

stefano-garzarella commented 11 months ago

Thanks for sharing, I confirm that it works for vhost-vsock.

Thanks for confirming!

But it still doesn't work with vhost-user-vsock. do we need a kind of separate threads for tx and rx in vhost-user-vsock?(like proposal #323 ?)

Yes, perhaps that might be the case. As a workaround, have you tried increasing --tx-buffer-size parameter of vhost-device-vsock?

ikicha commented 11 months ago

Increasing tx buf size just postpones the error. In guest, the program hung up at client.sendall(response). So I just tried add timeout before client.sendall(response) to check hypothesis about vsock buf, and it works.

I think vhost-device-vsock needs a kind of flow control logic, and I was curious if vhost-vsock has something like that. (because it doesn't happened in vhost-vosck..)

(BTW, the counter part of vhost-device-vsock in vhost-vsock is drivers/vhost/vsock.c, isn't it?)

stefano-garzarella commented 11 months ago

Increasing tx buf size just postpones the error. In guest, the program hung up at client.sendall(response). So I just tried add timeout before client.sendall(response) to check hypothesis about vsock buf, and it works.

I think vhost-device-vsock needs a kind of flow control logic, and I was curious if vhost-vsock has something like that. (because it doesn't happened in vhost-vosck..)

Yeah, we need to check better that part. The main difference is that in vhost-device-vsock, we also have the unix socket and so I think there is the problem, between the unix socket and the virtio device. If you have time, it would be great to be able to solve it ;-)

(BTW, the counter part of vhost-device-vsock in vhost-vsock is drivers/vhost/vsock.c, isn't it?)

Right.

stefano-garzarella commented 11 months ago

@ikicha ah, maybe increasing the vsock buffer in guest.py in https://github.com/rust-vmm/vhost-device/issues/494#issuecomment-1777267908 will help, since I guess the problem in vhost-device-vsock could be similar to the problem in the python script of https://github.com/rust-vmm/vhost-device/issues/494#issue-1957376775

stefano-garzarella commented 11 months ago

@ikicha FYI there was a potential un-init field in virtio-vsock common code (host-guest) affecting buf_alloc and fwd_cnt, so this could also be related to this issue.

Introduced in Linux v6.3.

ikicha commented 11 months ago

Thanks for sharing! It might a part of problem, but I think I found the root cause.

There is silent return via error in epoll_register between reading from uds and writing it into vq in conn.recv_pkt, it causes packet drop.

ikicha commented 11 months ago

I'll send a pull request about that soon.

rust-vmm / vhost-device

[vsock] tx/rx hungup after large packet is delivered #494