Darwin MacBook-Pro-de-Francois.local 21.6.0 Darwin Kernel Version 21.6.0: Mon Aug 22 20:19:52 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T6000 arm64
Crates
tonic
Description
When using a client with a balance_channel with N endpoints, I observed the following:
I send a request to the server that has not yet started. I'm receiving an error which is fine.
I start the server.
I send a request again, then I got an error. I need to send typically between 2 and 6 requests to get an Ok result finally.
Note that if there is one endpoint, there is no problem. This comes from the fact that, when servers are not started, the poll_ready will be executed on all Reconnect... and we will end up with NReconnect instances with errors. Only one will answer and return the error. The other N-1 will keep the error and return it on the next call.
Solution
I can push a PR to fix that but I want to have your feedback before.
Sorry for the delay on this! Yeah, I don't have any extra insight than what you have here so if there is a fix that doesn't break other things I am happy to take a look at that!
Bug Report
Version
Platform
Darwin MacBook-Pro-de-Francois.local 21.6.0 Darwin Kernel Version 21.6.0: Mon Aug 22 20:19:52 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T6000 arm64
Crates
tonic
Description
When using a client with a
balance_channel
withN
endpoints, I observed the following:I wrote a test here: https://github.com/fmassot/tonic-balance-channel-with-lazy-connection/blob/main/src/main.rs#L25
When digging into the code, the issue comes from the way errors are handled with lazy connections: https://github.com/hyperium/tonic/blob/master/tonic/src/transport/service/reconnect.rs#L63-L155
Note that if there is one endpoint, there is no problem. This comes from the fact that, when servers are not started, the
poll_ready
will be executed on allReconnect
... and we will end up withN
Reconnect
instances with errors. Only one will answer and return the error. The otherN-1
will keep the error and return it on the next call.Solution
I can push a PR to fix that but I want to have your feedback before.