I have been following the GRPC example to add chaos testing to my distributed system. The determinism has been extremely nice for reproducing issues! However, I keep running into issues that TCP would generally handle on its own.
I've run into situations like this when setting builder.fail_rate(0.01).repair_rate(0.9):
Client and server do the TCP 3-way handshake
Client sends its request packet, which gets dropped
The packet is never re-sent, but also the link isn't considered broken
The client task hangs until the request deadline hits, and no more packets are sent out
Is this considered normal? Please forgive me if my understanding of TCP is rusty 😅
Unfortunately we don't have the best fidelity yet. There is no re-transmit behavior implemented. I'll group this issue along with similar ones and hopefully we can prioritize this work soon.
I have been following the GRPC example to add chaos testing to my distributed system. The determinism has been extremely nice for reproducing issues! However, I keep running into issues that TCP would generally handle on its own.
I've run into situations like this when setting
builder.fail_rate(0.01).repair_rate(0.9)
:Is this considered normal? Please forgive me if my understanding of TCP is rusty 😅