tokio-rs / turmoil

Add hardship to your tests
MIT License
766 stars 47 forks source link

Improve flow control for TCP connections (GRPC example) #185

Open dtwitty opened 1 week ago

dtwitty commented 1 week ago

I have been following the GRPC example to add chaos testing to my distributed system. The determinism has been extremely nice for reproducing issues! However, I keep running into issues that TCP would generally handle on its own.

I've run into situations like this when setting builder.fail_rate(0.01).repair_rate(0.9):

  1. Client and server do the TCP 3-way handshake
  2. Client sends its request packet, which gets dropped
  3. The packet is never re-sent, but also the link isn't considered broken
  4. The client task hangs until the request deadline hits, and no more packets are sent out

Is this considered normal? Please forgive me if my understanding of TCP is rusty 😅

mcches commented 1 week ago

Unfortunately we don't have the best fidelity yet. There is no re-transmit behavior implemented. I'll group this issue along with similar ones and hopefully we can prioritize this work soon.