hasura / graphql-engine

Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
https://hasura.io
Apache License 2.0
31.18k stars 2.77k forks source link

Error when following "Auto-apply migrations/metadata when the server starts" docs #7538

Closed Gozala closed 2 years ago

Gozala commented 3 years ago

Version Information

Server Version: CLI Version (for CLI related issue):

Environment

What is the expected behaviour?

Following Auto-apply migrations/metadata when the server starts should start hasura with postgress and apply provision database with all the migrations.

Keywords

What is the current behaviour?

Image starts and then terminates with timeout error

How to reproduce the issue?

  1. Execute following commands

    hasura init hasura
    cd hasura
    hasura migrate create init --sql-from-file ../schema.sql --database-name default
    #> WARN database default is not connected to hasura  
    #> INFO Migrations files created                      name=init version=1631292489116
    docker run -p 8080:8080 \
       -v migrations:/hasura-migrations \
       -v metadata:/hasura-metadata \
       -e HASURA_GRAPHQL_DATABASE_URL=postgres://postgres:postgrespassword@postgres:5432/postgres \
       hasura/graphql-engine:v2.0.0-alpha.7.cli-migrations-v3

Please provide any traces or logs that could help here.

docker run -p 8080:8080 \
       -v migrations:/hasura-migrations \
       -v metadata:/hasura-metadata \
       -e HASURA_GRAPHQL_DATABASE_URL=postgres://postgres:postgrespassword@postgres:5432/postgres \
       hasura/graphql-engine:v2.0.0-alpha.7.cli-migrations-v3

{"timestamp":"2021-09-10T17:10:22.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"migrations server port env var is not set, defaulting to 9691"}}
{"timestamp":"2021-09-10T17:10:22.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"server timeout is not set, defaulting to 30 seconds"}}
{"timestamp":"2021-09-10T17:10:22.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"starting graphql engine temporarily on port 9691"}}
{"timestamp":"2021-09-10T17:10:22.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"waiting 30 for 9691 to be ready"}}
{"type":"startup","timestamp":"2021-09-10T17:10:22.675+0000","level":"info","detail":{"kind":"server_configuration","info":{"live_query_options":{"batch_size":100,"refetch_delay":1},"transaction_isolation":"ISOLATION LEVEL READ COMMITTED","plan_cache_options":{"plan_cache_size":4000},"enable_maintenance_mode":false,"enabled_log_types":["http-log","websocket-log","startup","webhook-log"],"server_host":"HostAny","enable_allowlist":false,"remote_schema_permissions":false,"log_level":"info","auth_hook_mode":null,"use_prepared_statements":true,"unauth_role":null,"stringify_numeric_types":false,"enabled_apis":["metadata"],"enable_telemetry":true,"enable_console":false,"auth_hook":null,"infer_function_permissions":true,"experimental_features":[],"jwt_secret":null,"cors_config":{"allowed_origins":"*","disabled":false,"ws_read_cookie":null},"websocket_compression_options":"NoCompression","console_assets_dir":null,"admin_secret_set":false,"port":9691,"websocket_keep_alive":"KeepAliveDelay {unKeepAliveDelay = Seconds {seconds = 5s}}"}}}
{"type":"startup","timestamp":"2021-09-10T17:10:22.675+0000","level":"info","detail":{"kind":"postgres_connection","info":{"retries":1,"database_url":"postgres://postgres:...@postgres:5432/postgres"}}}
{"type":"pg-client","timestamp":"2021-09-10T17:10:31.235+0000","level":"warn","detail":{"message":"postgres connection failed, retrying(0)."}}
{"type":"pg-client","timestamp":"2021-09-10T17:10:35.640+0000","level":"warn","detail":{"message":"postgres connection failed, retrying(1)."}}
{"type":"startup","timestamp":"2021-09-10T17:10:35.640+0000","level":"error","detail":{"kind":"catalog_migrate","info":{"internal":"could not connect to server: Connection refused\n\tIs the server running on host \"postgres\" (23.200.237.236) and accepting\n\tTCP/IP connections on port 5432?\ncould not connect to server: Connection refused\n\tIs the server running on host \"postgres\" (23.60.91.236) and accepting\n\tTCP/IP connections on port 5432?\n","path":"$","error":"connection error","code":"postgres-error"}}}
{"internal":"could not connect to server: Connection refused\n\tIs the server running on host \"postgres\" (23.200.237.236) and accepting\n\tTCP/IP connections on port 5432?\ncould not connect to server: Connection refused\n\tIs the server running on host \"postgres\" (23.60.91.236) and accepting\n\tTCP/IP connections on port 5432?\n","path":"$","error":"connection error","code":"postgres-error"}
{"timestamp":"2021-09-10T17:10:52.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"failed waiting for 9691, try increasing HASURA_GRAPHQL_MIGRATIONS_SERVER_TIMEOUT (default: 30)"}}

Any possible solutions?

Provide more comprehensive documentation on how to provision hasura with db in the docker and how to apply database migrations. All the documentation seems to either:

  1. Lead to an error (presumably due to undocumented gaps)
  2. Require some UI interactions

Making it impossible to setup a CI enviroment

Can you identify the location in the source code where the problem exists?

I can not

If the bug is confirmed, would you be willing to submit a PR?

I do not know how to fix this

BenoitRanque commented 3 years ago

Do you have a postgres database running, with the container name postgres? The error log is indicating a failure to connect to the database:

could not connect to server: Connection refused
  Is the server running on host \"postgres\" (23.200.237.236) and accepting
  TCP/IP connections on port 5432?
 could not connect to server: Connection refused
   Is the server running on host \"postgres\" (23.60.91.236) and accepting
   TCP/IP connections on port 5432?
LouAlbano commented 3 years ago

23.60.91.236 and 23.200.237.236 are IPs Centurylink responds with instead of NXDOMAIN.

$ host postgres 205.171.3.25
Using domain server:
Name: 205.171.3.25
Address: 205.171.3.25#53
Aliases: 

postgres has address 23.60.91.236
postgres has address 23.200.237.236
Host postgres not found: 3(NXDOMAIN)

It's their "search helper" page. As you can see it helps confusion and probably nothing else.

You need to change postgres:5432 to the correct IP or hostname of your DB.

Also I personally would reconfig your network to use a proper DNS server that responds with proper NXDOMAIN responses to avoid this in the future.

Gozala commented 3 years ago

I think problem was caused by the docker image that was used in docks. Changing it to following hasura/graphql-engine:v2.0.9.cli-migrations-v3 fixed the issue for me.

I still find it really confusing that docs for running hasura locally https://hasura.io/docs/latest/graphql/core/getting-started/docker-simple.html#docker-simple use container image hasura/graphql-engine:v2.0.9 but in order to apply metadata & migrations you need to use hasura/graphql-engine:v2.0.9.cli-migrations-v3 because:

  1. It is hard to spot that cli-migrations-v3 serves special purpose there.
  2. Just mounting local metadata and migrations dirs seem to have no effect on default image.

I think user experience would be whole lot more intuitive and and straight forward if:

  1. Default docker config mounted metadata and migrations.
  2. graphql-engine applied mounted metadata & migrations by default.
  3. There was a flag similar to HASURA_GRAPHQL_ENABLE_CONSOLE to disable migrations by default.
BenoitRanque commented 3 years ago

The purpose of the cli-migrations-v3 variants of the images is to apply the migrations automatically on startup. With the default image, you would use the cli to run hasura metadata appy and other commands to apply the metadata, you do not need to mount a metadata or migrations directory.

A lot of users want to control themselves when the metadata is applied, using their own automation flow. So the behavior of the default image is the one with the least amount of magic, and the least opinionated when it comes to management of the migrations.

Gozala commented 3 years ago

The purpose of the cli-migrations-v3 variants of the images is to apply the migrations automatically on startup. With the default image, you would use the cli to run hasura metadata appy and other commands to apply the metadata, you do not need to mount a metadata or migrations directory.

I understand that. What I was suggesting is that if you user mounted those directories that is clear enough signal that metadata / migrations from those need to be picked up.

A lot of users want to control themselves when the metadata is applied, using their own automation flow. So the behavior of the default image is the one with the least amount of magic, and the least opinionated when it comes to management of the migrations.

I would expect that mounting dirs is probably an enough signal to infer user intent. However if that sounds unconvincing, inverted env var could be used e.g. if HASURA_GRAPHQL_ENABLE_AUTOMIGRATE is set to true migrations are applied otherwise they are not.

That way people not very familiar with docker (like myself) will be less confused when some docs use docker-compose with config file and some docs use docker run but as it turns out with a different image with cli-migrations-v3 suffix.

jflambert commented 2 years ago

@rikinsk using 2.4.0 and I'm getting a ton more "failed waiting for 9691" errors during cli-migrations startup phase. Is there an other issue you are aware of I can contribute to, or should I create a new one...

gnekich commented 2 years ago

@jflambert I was getting something like this;

qemu: uncaught target signal 11 (Segmentation fault) - core dumped
{"timestamp":"2022-05-23T19:40:04.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"server timeout is not set, defaulting to 30 seconds"}}
{"timestamp":"2022-05-23T19:40:04.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"starting graphql engine temporarily on port 9691"}}
{"timestamp":"2022-05-23T19:40:04.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"waiting 30 for 9691 to be ready"}}
{"timestamp":"2022-05-23T19:40:37.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"failed waiting for 9691, try increasing HASURA_GRAPHQL_MIGRATIONS_SERVER_TIMEOUT (default: 30)"}}

when running our custom built image based on Hasura, that already had baked in metadata and migrations. The issue was that I was trying to run it on ARM64 architecture on my M1 Mac.

The solution was simple: change Dockerfile FROM hasura/graphql-engine:v2.6.1.cli-migrations-v3 to FROM hasura/graphql-engine:v2.6.1.cli-migrations-v3.ubuntu.arm64

thus changing image from hasura/graphql-engine:v2.6.1.cli-migrations-v3 to hasura/graphql-engine:v2.6.1.cli-migrations-v3.ubuntu.arm64 should solve your issue.

robertjdominguez commented 2 years ago

Please let us know if you have any further comments, and we'll reopen.