Server Version:
CLI Version (for CLI related issue): v2.44.0.cli-migrations-v3
Environment
self-hosted
What is the current behaviour?
We are running hasura in our kubernetes cluster. We have a postgres DB (it's ephemeral) in a container deployed next to it. On startup, hasura is faster than the postgres DB. So I naturally see the log entry
{"error":"connection error","path":"$","code":"postgres-error","internal":"connection to server at \"localhost\" (127.0.0.1), port 5433 failed: Connection refused\n\tIs the server running on that host and accepting TCP/IP connections?\nconnection to server at \"localhost\" (::1), port 5433 failed: Cannot assign requested address\n\tIs the server running on that host and accepting TCP/IP connections?\n"}
The problem that I have is, that hasura does not retry the postgres connection. There is no additional logging until the hasura gets killed by HASURA_GRAPHQL_MIGRATIONS_SERVER_TIMEOUT. Once a new container is spun up by kubernetes, the postgres db is ready and everything works fine.
I don't remember that we had a similar issue before. So it might be a regression issue, because we've been using the same setup for several months now.
The central question here is: Is there a retry mechanism for the database connection of the temporary server that's being created by the cli-migrations-v3 image?. From what I can see, there is not. Even when running the (tests)[https://github.com/hasura/graphql-engine/tree/master/packaging/cli-migrations/v3/test] in the hasura repo, I can see the same behavior if the postgres DB is not already available when hasura starts up.
I am willing to contribute to the project to fix this, if necessary!
What is the expected behaviour?
The container retries the DB connection in a (optional: configurable) interval.
Version Information
Server Version: CLI Version (for CLI related issue): v2.44.0.cli-migrations-v3
Environment
self-hosted
What is the current behaviour?
We are running hasura in our kubernetes cluster. We have a postgres DB (it's ephemeral) in a container deployed next to it. On startup, hasura is faster than the postgres DB. So I naturally see the log entry
The problem that I have is, that hasura does not retry the postgres connection. There is no additional logging until the hasura gets killed by
HASURA_GRAPHQL_MIGRATIONS_SERVER_TIMEOUT
. Once a new container is spun up by kubernetes, the postgres db is ready and everything works fine.I don't remember that we had a similar issue before. So it might be a regression issue, because we've been using the same setup for several months now.
The central question here is: Is there a retry mechanism for the database connection of the temporary server that's being created by the cli-migrations-v3 image?. From what I can see, there is not. Even when running the (tests)[https://github.com/hasura/graphql-engine/tree/master/packaging/cli-migrations/v3/test] in the hasura repo, I can see the same behavior if the postgres DB is not already available when hasura starts up.
I am willing to contribute to the project to fix this, if necessary!
What is the expected behaviour?
The container retries the DB connection in a (optional: configurable) interval.
How to reproduce the issue?
docker-compose
file from the (test folder)[https://github.com/hasura/graphql-engine/blob/master/packaging/cli-migrations/v3/test/docker-compose.yaml].docker-compose up
hasura
container and see that there is no retry for the database connectionHASURA_GRAPHQL_MIGRATIONS_SERVER_TIMEOUT
has been reached.Screenshots or Screencast
Please provide any traces or logs that could help here.
In the (test)[https://github.com/hasura/graphql-engine/blob/master/packaging/cli-migrations/v3/test/test.sh] section of the image I can see that the postgres DB is spun up before the hasura instance. Maybe it's a coincident, but this might approve my suspicion that there is no repeated check of the DB connection in the hasura instance.
Any possible solutions/workarounds you're aware of?
Implement polling/retrying for the database connection.
Keywords
auto-migrate, cli-migrations-v3, database, postgres, metadata