tcp-acceleration-service / tas

TAS is a drop-in highly CPU efficient and scalable TCP acceleration service.
https://tcp-acceleration-service.github.io/
Other
80 stars 43 forks source link

Indefinite blocking and unable to receive data #15

Open kazikame opened 2 years ago

kazikame commented 2 years ago

Expected Behavior

TAS should send/receive packets without waiting indefinitely

Current Behavior

TAS sometimes fails to send (or receive) data sent by the last call to send(). This causes the receiver to wait indefinitely even after the sender has stopped.

Steps to Reproduce

The bug is non-deterministic and may happen at the server or the client, however it can be reproduced fairly reliably using the following server and client in this repo

  1. Compile using -Ofast and -march=native
  2. Run TAS on both the server and client
  3. Server:
    LD_PRELOAD=<path-to-libtas_interpose.so>  ./server <server-ip> <server-port
  4. Client:
    LD_PRELOAD=<path-to-libtas_interpose.so>  ./client<server-ip> <server-port

Context (Environment)

The bug was discovered when this software RDMA stack was attempted over TAS on two machines equipped with the 10G Intel 82599 NICs. All performance benchmarks get blocked indefinitely on TAS. They run fine on the regular kernel TCP stack.

PabstMatthew commented 2 years ago

One thing we've also noticed about this issue is that decreasing --fp-poll-interval-tas seems to make the success probability decrease.

rajathshashidhara commented 2 years ago

TAS is designed for long running applications -- it has an implicit assumption that applications indefinitely poll the network stack for updates. The sample code you've shared makes a fixed number of socket calls. Due to this, the client fails to propagate the last transmission update to its fastpath causing the server to infinitely block on recv(). This liveness condition is easy to satisfy -- ensure that your code polls on flextcp_context_poll() indefinitely. Note that the sockets API layer in TAS relies on this function to propagate updates to the fastpath.