Closed zioproto closed 3 weeks ago
Hi! Thanks for reporting the problem and providing a PR!
I'm not very keen on breaking compatibility, so I'd rule out solution 2. But also solution 1 has a potential of being a breaking change, regardless of the value. I'll need to dig a bit deeper to see how big of a problem this could be.
A solution 3 could be to improve the doc and make it very clear that socket_timeout is something that needs to be set. I agree that exposing a low-level setting is not ideal, but there is no way have a "one size fits all" default here, so many users may want to set this themselves anyway. Whatever new default value we choose isn't necessarily better than the current one, in the general case.
@aiven-sal should we have a look what other Valkey / Redis client implementations ( even in different languages ) are doing when setting TCP timeouts?
For example I see the golang client uses a default value of 3 seconds for socket_timeout
and 5 seconds for socket_connect_timeout
:
https://github.com/redis/go-redis/blob/80c9f5bb777dd4e6393d014cffe9d28428ecf756/options.go#L83-L97
The Node Client is using 5 seconds: https://github.com/redis/node-redis/blob/b87b8b113430eb5e860ab435dc84e141a2e8ae98/docs/client-configuration.md?plain=1#L11
It seems that not setting a default is not a good idea :(
I agree that having a default would better. 2, 3 or 5 they should all be fine (but maybe I'd pick 5 just to stay on the safe side) I was just worried about the implications of changing the default: I think it's unlikely that someone out there is relying on the fact that the connection never times out, but not impossible.
But it's about time we make a 6.1 where we can introduce some breaking changes. So I think we can change this default to 5.
This is a recurring issue, previously reported in this GitHub issue.
I encountered the same problem while testing a Valkey Cluster on Kubernetes. When a Valkey Cluster pod is evicted, my client attempts to reconnect immediately, but gets stuck in the
SYN_SENT
state indefinitely, trying to connect to an IP address that no longer exists in the Kubernetes cluster. In Kubernetes, pods are ephemeral, and each new pod is assigned a new IP address.From the ValkeyCluster documentation, it's not immediately clear that developers need to configure the TCP timeout. See the relevant section here. Furthermore, I don’t believe developers building applications that connect to Valkey should have to handle low-level TCP parameters.
Proposed solutions:
Set a default
socket_timeout
valueMake
socket_timeout
a mandatory parameter