gramineproject / gramine-tdx

A library OS for Linux multi-process applications, with Intel TDX support (experimental)
GNU Lesser General Public License v3.0
19 stars 5 forks source link

[PAL/vm-common] vsock: fix host-side vsock buf management #25

Closed dimakuv closed 6 months ago

dimakuv commented 6 months ago

Description of the changes

The host uses fwd_cnt and buf_alloc obtained in the guest's vsock packets for its buffer management (deciding whether to send a next packet to guest or back off). We previously incorrectly implemented this by making the two variables global for all virtual sockets. In reality, these variables are per-connection (per-socket). This mismatch between what the host expects and what Gramine guest implements led to hangs because on new connections, Gramine reported too many "bytes received" in fwd_cnt counter, leading the host to believe there are too many packets in flight on the new connection and to back off constantly.

Fix by moving the vars to fields of struct virtio_vsock_connection. An additional benefit is that we already have proper locking for per-connection fields, so there is no need for atomics or special synchronization.

Interestingly, Linux prior to v6.7 did not expose this Gramine bug because it itself had a bug of integer wrap around. This was fixed with commit https://github.com/torvalds/linux/commit/60316d7f10b17a in v6.7.

Fixes #24.

How to test this PR?

Run Redis or Memcached on Linux v6.7 or higher. I tested on v6.9.


This change is Reviewable