Open maxpain opened 4 years ago
I assume your Hasura is behind an Istio sidecar? I'd add a DestinationRule for your PG host that has tcpKeepalive set. Those keepalives are likely being dropped by the Isto mesh (envoy?), if tcpKeepalive is set they will likely be honored/allow to pass through.
Why not to fix this in Hasura?
If it indeed is the Istio service mesh blocking TCP Keepalives, that isn't really, first, a problem with Hasura, and second, not sure Hasura can really do anything if Istio is inappropriately mangling its network traffic.
The analogy is this: if your cellular phone provider's network (Istio's service mesh) is dropping your calls, you cannot do much at your handset (Hasura) to fix the problem if the actual network (Istio) is blocking, dropping, or mangling your call (TCP cxn).
I am not exactly sure if that is what is going on here, but I run Hasura on Istio (as well as many other services behind Istio sidecars) and this happens quite frequently (general network tuning when behind such invasive proxies).
I do not think you will be able to fix this problem on the Hasura side (and if so it likely will not be easy or without side effects), whereas Istio should be quite easily tunable/configurable to not mess with the connection how it appears it is.
Again, given what little I have heard of your environment, these are best-effort-with-info-at-hand guesses/assumptions...
Agreed with @24601
I don't think Hasura should (or can) do something specific for Istio network configuration (like if Istio is dropping keep alive requests). I am closing this issue for now. In case, there are specific suggestions that may apply generally to any network then happy to discuss them.
@tirumaraiselvan Why have you closed this issue? I think hasura SHOULD worry about dropping keep-alive connections and do reconnect or something. We have the same problem with native VPC networking in google cloud. Why do we need to worry about these internal hasura problems?
Reopening this to document appropriate Istio configuration.
Hello. We use Google Kubernetes Engine (we have two VPC interconnected networks for postgres and Kubernetes cluster), and some time ago, we had a problem with metadata sync between hasura and postgres. @rakeshkky from Hasura Team helped me and suggested to add
keepalives_idle=300
to postgres connection URL. That helped. But now I have the same problem with Istio service mesh, andkeepalives_idle
is not working.Any ideas?