hasura / graphql-engine

Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
https://hasura.io
Apache License 2.0
31.15k stars 2.76k forks source link

Hasura docker app crashes on Heroku on deployment - graphql-engine serve commands binds port before ready #7474

Open ro-savage opened 3 years ago

ro-savage commented 3 years ago

Whenever we do a new deploy of hasura, our app 'crashes' on heroku.

We believe this is due to the fact that the hasura image is binding to the dyno's port before hasura is ready to serve. Therefore Heroku will start redirecting traffic to the hasura even though hasura is still applying metadata and booting.

We've spoken to Heroku about this, and they said dynos will also start serve as soon as the port is binded.

Is there a command to start hasura and apply metadata, before it binds to the port / starts serving traffic? Or could the docker image be updated to only bind after hasura is ready to serve?

Here are our logs when booting - It basically just continues with the boot process but we get at=error code=H10 desc="App crashed" repeatedly for about 2-3 minutes until it's ready.

26 Aug 2021 10:52:50.160134 <133>1 2021-08-25T22:52:49.414471+00:00 app api - - Release v343 created by user *****@*****
26 Aug 2021 10:52:50.185189 <133>1 2021-08-25T22:52:49.414471+00:00 app api - - Set COMMIT_SHA, COMMIT_0e6f61be92249855a26134e1867ae1d5e7584f36 config vars by user *****@*****
26 Aug 2021 10:52:50.904107 <134>1 2021-08-25T22:52:50.536149+00:00 heroku web.1 - - Restarting
26 Aug 2021 10:52:53.632192 <134>1 2021-08-25T22:52:53.248981+00:00 heroku web.1 - - Starting process with command `/bin/sh -c graphql-engine\ serve\ --server-port\ \"\$\{PORT\}\"`
26 Aug 2021 10:52:54.999331 <158>1 2021-08-25T22:52:54.630806+00:00 heroku router - - at=error code=H10 desc="App crashed" method=GET path="/v1/graphql" host=*******.herokuapp.com request_id=3fdba7d3-85b8-4802-9cc6-f92abb356172 fwd="122.58.94.185" dyno= connect= service= status=503 bytes= protocol=https
26 Aug 2021 10:52:55.162333 <158>1 2021-08-25T22:52:54.633517+00:00 heroku router - - at=error code=H10 desc="App crashed" method=GET path="/v1/graphql" host=*******.herokuapp.com request_id=b95abaf0-b107-4327-9c75-8acc44170351 fwd="101.100.129.151" dyno= connect= service= status=503 bytes= protocol=https
26 Aug 2021 10:52:55.598280 <190>1 2021-08-25T22:52:54.879801+00:00 app web.1 - - {"timestamp":"2021-08-25T22:52:54.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"migrations server port env var is not set, defaulting to 9691"}}
26 Aug 2021 10:52:55.598270 <190>1 2021-08-25T22:52:54.880569+00:00 app web.1 - - {"timestamp":"2021-08-25T22:52:54.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"server timeout is not set, defaulting to 30 seconds"}}
26 Aug 2021 10:52:55.598267 <190>1 2021-08-25T22:52:54.881394+00:00 app web.1 - - {"timestamp":"2021-08-25T22:52:54.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"starting graphql engine temporarily on port 9691"}}
26 Aug 2021 10:52:55.598250 <190>1 2021-08-25T22:52:54.882275+00:00 app web.1 - - {"timestamp":"2021-08-25T22:52:54.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"waiting 30 for 9691 to be ready"}}
26 Aug 2021 10:52:56.044333 <158>1 2021-08-25T22:52:55.462549+00:00 heroku router - - at=error code=H10 desc="App crashed" method=GET path="/v1/graphql" host=*******.herokuapp.com request_id=2eb50893-9e7e-46e1-ae5e-862a3bf496d6 fwd="158.140.247.176" dyno= connect= service= status=503 bytes= protocol=https
26 Aug 2021 10:52:56.0901402 <190>1 2021-08-25T22:52:55.687592+00:00 app web.1 - - {"type":"startup","timestamp":"2021-08-25T22:52:55.686+0000","level":"info","detail":{"kind":"server_configuration","info":{"live_query_options":{"batch_size":100,"refetch_delay":1},"transaction_isolation":"ISOLATION LEVEL READ COMMITTED","plan_cache_options":{"plan_cache_size":4000},"enable_maintenance_mode":false,"enabled_log_types":["http-log","websocket-log","startup","webhook-log","query-log"],"server_host":"HostAny","enable_allowlist":false,"remote_schema_permissions":false,"log_level":"info","auth_hook_mode":null,"use_prepared_statements":true,"unauth_role":null,"stringify_numeric_types":false,"v1-boolean-null-collapse":false,"graceful_shutdown_timeout":60,"enabled_apis":["metadata"],"enable_telemetry":false,"enable_console":true,"auth_hook":null,"infer_function_permissions":true,"experimental_features":[],"events_fetch_batch_size":100,"jwt_secret":{"audience":null,"claims_format":"json","claims_namespace":"https://hasura.io/jwt/claims","key":"<JWK REDACTED>","header":null,"type":"<TYPE REDACTED>","issuer":null},"cors_config":{"allowed_origins":"*","disabled":false,"ws_read_cookie":null},"websocket_compression_options":"NoCompression","console_assets_dir":null,"admin_secret_set":true,"port":9691,"websocket_keep_alive":"KeepAliveDelay {unKeepAliveDelay = Seconds {seconds = 5s}}"}}}
26 Aug 2021 10:52:56.090337 <190>1 2021-08-25T22:52:55.687608+00:00 app web.1 - - {"type":"startup","timestamp":"2021-08-25T22:52:55.686+0000","level":"info","detail":{"kind":"postgres_connection","info":{"retries":1,"database_url":"postgres://********:...@************.compute-1.amazonaws.com:5432/************"}}}
26 Aug 2021 10:52:56.317331 <158>1 2021-08-25T22:52:55.829049+00:00 heroku router - - at=error code=H10 desc="App crashed" method=GET path="/v1/graphql" host=************.herokuapp.com request_id=c1a88d49-14d3-4404-a257-79c6d1cf5d45 fwd="122.58.94.185" dyno= connect= service= status=503 bytes= protocol=https
26 Aug 2021 10:52:56.441192 <134>1 2021-08-25T22:52:56.018749+00:00 heroku web.1 - - Starting process with command `/bin/sh -c graphql-engine\ serve\ --server-port\ \"\$\{PORT\}\"`
26 Aug 2021 10:52:56.469333 <158>1 2021-08-25T22:52:55.856205+00:00 heroku router - - at=error code=H10 desc="App crashed" method=GET path="/v1/graphql" host=************.herokuapp.com request_id=1adb0d68-721e-47c5-8428-dc6b5b465ab7 fwd="101.100.129.151" dyno= connect= service= status=503 bytes= protocol=https
26 Aug 2021 10:52:57.882333 <158>1 2021-08-25T22:52:57.302547+00:00 heroku router - - at=error code=H10 desc="App crashed" method=GET path="/v1/graphql" host=************.herokuapp.com request_id=2ff7cbe4-7446-43a9-b940-5f4ee827e01b fwd="101.100.129.151" dyno= connect= service= status=503 bytes= protocol=https
26 Aug 2021 10:52:58.143331 <158>1 2021-08-25T22:52:57.626327+00:00 heroku router - - at=error code=H10 desc="App crashed" method=GET path="/v1/graphql" host=************.herokuapp.com request_id=2dbf2ce4-1eac-446a-9f20-ff9f837220bc fwd="122.58.94.185" dyno= connect= service= status=503 bytes= protocol=https
26 Aug 2021 10:52:58.320227 <190>1 2021-08-25T22:52:57.690652+00:00 app web.1 - - {"type":"unstructured","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":"Thread asyncActionsProcessor (re)started"}
26 Aug 2021 10:52:58.320221 <190>1 2021-08-25T22:52:57.690655+00:00 app web.1 - - {"type":"unstructured","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":"Thread checkForUpdates (re)started"}
26 Aug 2021 10:52:58.320274 <190>1 2021-08-25T22:52:57.690657+00:00 app web.1 - - {"type":"schema-sync-thread","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":{"thread_type":"processor","info":{"currentVersion":0,"latestResourceVersion":205}}}
26 Aug 2021 10:52:58.320239 <190>1 2021-08-25T22:52:57.690676+00:00 app web.1 - - {"type":"unstructured","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":"Thread asyncActionSubscriptionsProcessor (re)started"}
26 Aug 2021 10:52:58.320228 <190>1 2021-08-25T22:52:57.691063+00:00 app web.1 - - {"type":"unstructured","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":"Thread runCronEventsGenerator (re)started"}
26 Aug 2021 10:52:58.320274 <190>1 2021-08-25T22:52:57.691088+00:00 app web.1 - - {"type":"startup","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":{"kind":"catalog_migrate","info":"Already at the latest catalog version (46); nothing to do."}}
26 Aug 2021 10:52:58.320276 <190>1 2021-08-25T22:52:57.691088+00:00 app web.1 - - {"type":"startup","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":{"kind":"schema-sync","info":"Schema sync enabled. Polling at Milliseconds {milliseconds = 1s}"}}
26 Aug 2021 10:52:58.320326 <190>1 2021-08-25T22:52:57.691089+00:00 app web.1 - - {"type":"startup","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":{"kind":"schema-sync","info":{"thread_id":"ThreadId 26","instance_id":"8bb8f3d1-cb66-40d5-9163-0f1d3cdea5c2","message":"listener thread started"}}}
26 Aug 2021 10:52:58.320327 <190>1 2021-08-25T22:52:57.691089+00:00 app web.1 - - {"type":"startup","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":{"kind":"schema-sync","info":{"thread_id":"ThreadId 28","instance_id":"8bb8f3d1-cb66-40d5-9163-0f1d3cdea5c2","message":"processor thread started"}}}
26 Aug 2021 10:52:58.320231 <190>1 2021-08-25T22:52:57.691089+00:00 app web.1 - - {"type":"startup","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":{"kind":"event_triggers","info":"starting workers"}}
26 Aug 2021 10:52:58.320232 <190>1 2021-08-25T22:52:57.69109+00:00 app web.1 - - {"type":"startup","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":{"kind":"scheduled_triggers","info":"preparing data"}}
26 Aug 2021 10:52:58.320262 <190>1 2021-08-25T22:52:57.69109+00:00 app web.1 - - {"type":"startup","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":{"kind":"server","info":{"time_taken":1.296780243,"message":"starting API server"}}}
26 Aug 2021 10:52:58.320227 <190>1 2021-08-25T22:52:57.691104+00:00 app web.1 - - {"type":"unstructured","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":"Thread SchemeUpdate.listener (re)started"}
26 Aug 2021 10:52:58.320215 <190>1 2021-08-25T22:52:57.691105+00:00 app web.1 - - {"type":"unstructured","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":"Thread ourIdleGC (re)started"}
26 Aug 2021 10:52:58.320228 <190>1 2021-08-25T22:52:57.691105+00:00 app web.1 - - {"type":"unstructured","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":"Thread SchemeUpdate.processor (re)started"}
26 Aug 2021 10:52:58.320223 <190>1 2021-08-25T22:52:57.691105+00:00 app web.1 - - {"type":"unstructured","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":"Thread processEventQueue (re)started"}
26 Aug 2021 10:52:58.320230 <190>1 2021-08-25T22:52:57.691106+00:00 app web.1 - - {"type":"unstructured","timestamp":"2021-08-25T22:52:56.919+0000","level":"info","detail":"Thread processScheduledTriggers (re)started"}
26 Aug 2021 10:52:58.320237 <190>1 2021-08-25T22:52:57.894884+00:00 app web.1 - - {"timestamp":"2021-08-25T22:52:57.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"port 9691 is ready"}}
26 Aug 2021 10:52:58.320301 <190>1 2021-08-25T22:52:57.900469+00:00 app web.1 - - {"timestamp":"2021-08-25T22:52:57.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"env var HASURA_GRAPHQL_MIGRATIONS_DIR is not set, defaulting to /hasura-migrations"}}
26 Aug 2021 10:52:58.320297 <190>1 2021-08-25T22:52:57.907352+00:00 app web.1 - - {"timestamp":"2021-08-25T22:52:57.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"env var HASURA_GRAPHQL_METADATA_DIR is not set, defaulting to /hasura-metadata"}}
26 Aug 2021 10:52:58.320256 <190>1 2021-08-25T22:52:57.912438+00:00 app web.1 - - {"timestamp":"2021-08-25T22:52:57.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-apply","info":"applying metadata from /hasura-metadata"}}
26 Aug 2021 10:52:58.794333 <158>1 2021-08-25T22:52:58.296366+00:00 heroku router - - at=error code=H10 desc="App crashed" method=GET path="/v1/graphql" host=************.herokuapp.com request_id=67a64154-06c1-4b91-a377-954eeafdf8ab fwd="158.140.247.176" dyno= connect= service= status=503 bytes= protocol=https
26 Aug 2021 10:52:58.861280 <190>1 2021-08-25T22:52:58.177586+00:00 app web.1 - - {"timestamp":"2021-08-25T22:52:58.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"migrations server port env var is not set, defaulting to 9691"}}
26 Aug 2021 10:52:58.861270 <190>1 2021-08-25T22:52:58.178465+00:00 app web.1 - - {"timestamp":"2021-08-25T22:52:58.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"server timeout is not set, defaulting to 30 seconds"}}
26 Aug 2021 10:52:58.861267 <190>1 2021-08-25T22:52:58.179496+00:00 app web.1 - - {"timestamp":"2021-08-25T22:52:58.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"starting graphql engine temporarily on port 9691"}}
26 Aug 2021 10:52:58.861250 <190>1 2021-08-25T22:52:58.180424+00:00 app web.1 - - {"timestamp":"2021-08-25T22:52:58.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"waiting 30 for 9691 to be ready"}}
26 Aug 2021 10:52:59.002367 <190>1 2021-08-25T22:52:58.601517+00:00 app web.1 - - {"level":"info","msg":"Help us improve Hasura! The cli collects anonymized usage stats which\nallow us to keep improving Hasura at warp speed. To opt-out or read more,\nvisit https://hasura.io/docs/latest/graphql/core/guides/telemetry.html\n","time":"2021-08-25T22:52:58Z"}
26 Aug 2021 10:52:59.002281 <190>1 2021-08-25T22:52:58.692861+00:00 app web.1 - - {"type":"schema-sync-thread","timestamp":"2021-08-25T22:52:58.491+0000","level":"info","detail":{"thread_type":"processor","info":{"message":"Schema Version changed with notifications"}}}
26 Aug 2021 10:52:59.002543 <190>1 2021-08-25T22:52:58.692868+00:00 app web.1 - - {"type":"http-log","timestamp":"2021-08-25T22:52:58.491+0000","level":"error","detail":{"operation":{"error":{"path":"$","error":"resource does not exist","code":"not-found"},"request_id":"659327b7-acc9-4d86-9ec4-cb611f24dc39","response_size":65,"raw_query":""},"request_id":"659327b7-acc9-4d86-9ec4-cb611f24dc39","http_info":{"status":404,"http_version":"HTTP/1.1","url":"/v1alpha1/config","ip":"127.0.0.1","method":"GET","content_encoding":null}}}
26 Aug 2021 10:52:59.002448 <190>1 2021-08-25T22:52:58.692904+00:00 app web.1 - - {"type":"http-log","timestamp":"2021-08-25T22:52:58.491+0000","level":"info","detail":{"operation":{"request_id":"cfb81e03-9dee-47c7-875c-5f9898e86052","response_size":15},"request_id":"cfb81e03-9dee-47c7-875c-5f9898e86052","http_info":{"status":200,"http_version":"HTTP/1.1","url":"/v1/version","ip":"127.0.0.1","method":"GET","content_encoding":null}}}
26 Aug 2021 10:52:59.002602 <190>1 2021-08-25T22:52:58.692919+00:00 app web.1 - - {"type":"http-log","timestamp":"2021-08-25T22:52:58.491+0000","level":"info","detail":{"operation":{"query_execution_time":1.6697823e-2,"user_vars":{"x-hasura-role":"admin"},"request_id":"e85d4b97-ea9a-4c64-9e5e-e13bf1727a47","response_size":178,"query":{"args":{},"type":"get_catalog_state"},"request_read_time":4.94e-6},"request_id":"e85d4b97-ea9a-4c64-9e5e-e13bf1727a47","http_info":{"status":200,"http_version":"HTTP/1.1","url":"/v1/metadata","ip":"127.0.0.1","method":"POST","content_encoding":"gzip"}}}
26 Aug 2021 10:52:59.002598 <190>1 2021-08-25T22:52:58.692932+00:00 app web.1 - - {"type":"http-log","timestamp":"2021-08-25T22:52:58.491+0000","level":"info","detail":{"operation":{"query_execution_time":2.711089e-2,"user_vars":{"x-hasura-role":"admin"},"request_id":"e31fe220-2351-45c1-8b6b-22ac4739de37","response_size":7104,"query":{"args":{},"type":"export_metadata"},"request_read_time":4.41e-6},"request_id":"e31fe220-2351-45c1-8b6b-22ac4739de37","http_info":{"status":200,"http_version":"HTTP/1.1","url":"//v1/query","ip":"127.0.0.1","method":"POST","content_encoding":"gzip"}}}
26 Aug 2021 10:52:59.002169 <190>1 2021-08-25T22:52:58.692959+00:00 app web.1 - - {"level":"info","msg":"Applying metadata...","time":"2021-08-25T22:52:58Z"}
********** TRUNCATED ******************
atb00ker commented 3 years ago

Okay, I have not faced this particular problem with hasura but that's how other services in docker containers behave as well, so I am familiar with the problem.

My usually solution is to make the application more resilient during setup. i.e assume you have only 2 containers, your_app container and graphql-engine container.

  1. Your docker-compose starts graphql-engine.
  2. The graphql image is started but not ready to serve.
  3. docker-compose doesn't know it's not ready to serve so it starts the your_app
  4. your_app's setup scripts checks for dependencies
  5. It sends a basic request to graphql-engine and waits for a response.... it gets 400.
  6. the your_app image waits for 5 seconds and sends the request again... hurray, 200.
  7. You app continues the rest of the setup.

It's a simple logic but in future, say your_app starts using... rabbitmq or some custom dependency, then you'll need just send request to rabbitmq / dependency and check the same thing before you tell you app to really get started taking requests.

A little verbose response but I hope it helps you in the meanwhile. I'll subscribe to this thread as well to see what team hasura thinks about the problem! :smile:

ro-savage commented 3 years ago

@atb00ker - Thanks for your input.

Unfortunately this won't work on Heroku. Heroku automatically decides when an app is live based on binding and re-routes traffic automatically.

This is basically what happens right now, we keep retrying until we get a 200 OK. Which isn't too bad, but its annoying that every deploy our error and alert detection starts going nuts. Plus it causes unnessary downtime.

If it didn't bind until the migration and other booting processes were finished, then it would mean there was 0 downtime.

atb00ker commented 3 years ago

@ro-savage I completely understand you. As far as I know the binding part is done by docker, not sure what Hasura team can do about it, but if you find an alternative solution, please let me know as well; might be useful for future projects. 😄