Deadlock when both client and server have heavy write pressures

jerryz920 commented 5 years ago

Hi there,

I've encountered a problem that my program will deadlock. I have narrowed down the problem to the TLSStream in this crate. If both TLS client and server try to write a lot of data and fully consume the OS socket buffer, the problem will stably occur.

To add some context, my usage scenario is a multiple-hop proxy, where I control a few proxy nodes. Clients connect to any proxy node and the first proxy will connect to the original destination through another proxy. These proxies will talk to each other using mutually authenticated TLS connections.

The core logic for the proxies are fairly simple, mainly two parallel copy-future that are joined:

let (client_r, client_w) = client_socket.split();
let (proxy_r, proxy_w) = proxy_socket.split();

copy(client_r, proxy_w).and_then(|(_,_, wh)| tokio::io::shutdown(wh))
  .join(copy(proxy_r, client_w).and_then(|(_,_,wh)| tokio::io::shutdown(wh)))

To reproduce the problem, I created a ready-to-build tar ball in the attachment (built on stable-1.33). It has a client and a server, both trying to copy a local file to the socket, receive data from the socket, and write to /dev/null. It will deadlock when I add a localhost latency of 50ms with tc tool, and try to mutually transfer a 100MB file. From strace log I can observe that after the syscall sendto returns "EAGAIN" on each TLS socket, the sockets will no longer activate, and the connection can never finish.

One possibility to exclude is the shutdown hook. I saw TLSStream has some work to do in the shutdown hook, which I currently bypassed by redirecting the shutdown call directly to the TCPStream::shutdown(write). But since tests on small files and with no added latency can successfully finish, I believe this is not the cause.

I spent some time looking into the code, and I think the most suspicious thing is the TLS read (from std::io::Read). From what I see, it will call "complete_io", which tries to flush out TLS frames if there are any pending ones. If this finishes without error, then it continues to read new frames. However, when both sides have heavy write pressures, things will break. On each side, when OS write buffer is fully consumed, calling send on the TLS socket will always return EAGAIN. As a result, actual read syscall is not even reached because of send errors. Such condition can be cleared only when the remote side sends back ACK for more window... But it may not happen: the remote side may also experience the same problem. The result is deadlock waiting for each other to read something. What's worse, I occasionally observed 100% CPU usage due to millions of crazy "sendto,EAGAIN" flood... I can't reproduce it yet with the proof-of-concept code, but it does happen in the real environment.

My thoughts is that maybe TLSStream does not need to do any write before doing read. Does this sound reasonable to you, or are there specific concerns in order toward fixing the deadlock problem?

tokio-issue.tar.gz

quininer commented 5 years ago

Thank you for your detailed report!

I agree that read_tls should not be blocked by write_tls. But we can't just remove write_tls, because if there is no write call in future, then we will never have a chance to write_tls.

quininer commented 5 years ago

Can you look if 0.9-fix branch fix the problem?

jerryz920 commented 5 years ago

Tested locally, good fix. Thank you!

quininer / tokio-rustls

Deadlock when both client and server have heavy write pressures #32