hasura / graphql-engine

Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
https://hasura.io
Apache License 2.0
31.17k stars 2.77k forks source link

Metadata sync problem with Istio #5381

Open maxpain opened 4 years ago

maxpain commented 4 years ago

Hello. We use Google Kubernetes Engine (we have two VPC interconnected networks for postgres and Kubernetes cluster), and some time ago, we had a problem with metadata sync between hasura and postgres. @rakeshkky from Hasura Team helped me and suggested to add keepalives_idle=300 to postgres connection URL. That helped. But now I have the same problem with Istio service mesh, and keepalives_idle is not working.

Any ideas?

24601 commented 4 years ago

I assume your Hasura is behind an Istio sidecar? I'd add a DestinationRule for your PG host that has tcpKeepalive set. Those keepalives are likely being dropped by the Isto mesh (envoy?), if tcpKeepalive is set they will likely be honored/allow to pass through.

maxpain commented 4 years ago

Why not to fix this in Hasura?

24601 commented 4 years ago

If it indeed is the Istio service mesh blocking TCP Keepalives, that isn't really, first, a problem with Hasura, and second, not sure Hasura can really do anything if Istio is inappropriately mangling its network traffic.

The analogy is this: if your cellular phone provider's network (Istio's service mesh) is dropping your calls, you cannot do much at your handset (Hasura) to fix the problem if the actual network (Istio) is blocking, dropping, or mangling your call (TCP cxn).

I am not exactly sure if that is what is going on here, but I run Hasura on Istio (as well as many other services behind Istio sidecars) and this happens quite frequently (general network tuning when behind such invasive proxies).

I do not think you will be able to fix this problem on the Hasura side (and if so it likely will not be easy or without side effects), whereas Istio should be quite easily tunable/configurable to not mess with the connection how it appears it is.

Again, given what little I have heard of your environment, these are best-effort-with-info-at-hand guesses/assumptions...

tirumaraiselvan commented 4 years ago

Agreed with @24601

I don't think Hasura should (or can) do something specific for Istio network configuration (like if Istio is dropping keep alive requests). I am closing this issue for now. In case, there are specific suggestions that may apply generally to any network then happy to discuss them.

maxpain commented 4 years ago

@tirumaraiselvan Why have you closed this issue? I think hasura SHOULD worry about dropping keep-alive connections and do reconnect or something. We have the same problem with native VPC networking in google cloud. Why do we need to worry about these internal hasura problems?

tirumaraiselvan commented 4 years ago

Reopening this to document appropriate Istio configuration.