Engine stuck on error when remote schema not available at boot

pylebecq commented 3 years ago

Hello,

I'm currently working on migrating a project to hasura 2.0 and I noticed some differences between 2.0 in "backward compatible mode" and 2.0 fully migrated to support multiple database setup. And since I could not find any documentation about this difference.

So basically, we have project with a remote schema (which I will call API), which is a nodejs graphql API (written in typescript). In development, we have the following setup:

A docker-compose.yml file, with 2 services: A postgres database, and hasura graphql engine.
A Procfile with two entries: docker and api.

We are using a tool to run the Procfile and make sure everything we need is running. Basically, the Procfile will run two things at the same time:

docker-compose up to start the postgres and hasura containers
yarn start:dev to build and run the API

The thing is, the API needs Postgres to run, and hasura needs Postgres and the API to run, because it's used as a remote schema. But the API is taking some time to build and be up and running.

When running with the hasura container using tag v2.0.3.cli-migrations-v2, (using v1.3.3 directory structure), when everything starts, the following error happen in hasura:

Logs when remote schema is not available (v2.0.3.cli-migrations-v2 and running with old directory structure)

``` docker | riot-hasura | time="2021-08-02T16:28:31Z" level=fatal msg="error applying metadata \n{\n \"internal\": [\n {\n \"definition\": {\n \"definition\": {\n \"timeout_seconds\": 60,\n \"url\": \"{{BASE_API_URL}}/graphql\",\n \"forward_client_headers\": true\n },\n \"name\": \"API\",\n \"permissions\": [],\n \"comment\": null\n },\n \"reason\": \"Inconsistent object: HTTP exception occurred while sending the request to http://host.docker.internal:3000/graphql\",\n \"name\": \"remote_schema API\",\n \"type\": \"remote_schema\",\n \"message\": {\n \"message\": \"ConnectionFailure Network.Socket.connect: : does not exist (Connection refused)\",\n \"request\": {\n \"proxy\": null,\n \"secure\": false,\n \"path\": \"/graphql\",\n \"responseTimeout\": \"ResponseTimeoutMicro 60000000\",\n \"method\": \"POST\",\n \"host\": \"host.docker.internal\",\n \"requestVersion\": \"HTTP/1.1\",\n \"redirectCount\": \"10\",\n \"port\": \"3000\"\n }\n }\n },\n {\n \"definition\": {\n \"remote_field\": {\n \"campaignTargetsUUID\": {\n \"arguments\": {\n \"campaignId\": \"$id\"\n }\n }\n },\n \"name\": \"targets\",\n \"hasura_fields\": [\n \"id\"\n ],\n \"remote_schema\": \"API\",\n \"source\": \"default\",\n \"table\": {\n \"schema\": \"public\",\n \"name\": \"campaigns\"\n }\n },\n \"reason\": \"Inconsistent object: in table \\\"campaigns\\\": in remote relationship\\\"targets\\\": remote schema with name \\\"API\\\" not found\",\n \"name\": \"remote_relationship targets in table campaigns in source default\",\n \"type\": \"remote_relationship\"\n },\n {\n \"definition\": {\n \"remote_field\": {\n \"getService\": {\n \"arguments\": {\n \"serviceId\": \"$service_id\"\n }\n }\n },\n \"name\": \"service\",\n \"hasura_fields\": [\n \"service_id\"\n ],\n \"remote_schema\": \"API\",\n \"source\": \"default\",\n \"table\": {\n \"schema\": \"public\",\n \"name\": \"campaign_templates\"\n }\n },\n \"reason\": \"Inconsistent object: in table \\\"campaign_templates\\\": in remote relationship\\\"service\\\": remote schema with name \\\"API\\\" not found\",\n \"name\": \"remote_relationship service in table campaign_templates in source default\",\n \"type\": \"remote_relationship\"\n }\n ],\n \"path\": \"$.args\",\n \"error\": \"cannot continue due to inconsistent metadata\",\n \"code\": \"unexpected\"\n}" docker | riot-hasura exited with code 1 ```

That's okay because the container is restarted, again, and again, and at some point the API will be up and running and the hasura container will run fine.

After upgrading to the new directory structure using hasura scripts update-project-v3, and starting hasura again using tag v2.0.3.cli-migrations-v3, here the following error happen:

Logs when remote schema is not available (v2.0.3.cli-migrations-v3 and running with new directory structure)

``` docker | riot-hasura | {"type":"metadata","timestamp":"2021-08-02T16:52:27.095+0000","level":"warn","detail":{"message":"Inconsistent Metadata!","info":{"objects":[{"definition":{"definition":{"timeout_seconds":60,"url":"{{BASE_API_URL}}/graphql","forward_client_headers":true},"name":"API","permissions":[],"comment":null},"reason":"Inconsistent object: HTTP exception occurred while sending the request to http://host.docker.internal:3000/graphql","name":"remote_schema API","type":"remote_schema","message":{"message":"ConnectionFailure Network.Socket.connect: : does not exist (Connection refused)","request":{"proxy":null,"secure":false,"path":"/graphql","responseTimeout":"ResponseTimeoutMicro 60000000","method":"POST","host":"host.docker.internal","requestVersion":"HTTP/1.1","redirectCount":"10","port":"3000"}}},{"definition":{"remote_field":{"campaignTargetsUUID":{"arguments":{"campaignId":"$id"}}},"name":"targets","hasura_fields":["id"],"remote_schema":"API","source":"default","table":{"schema":"public","name":"campaigns"}},"reason":"Inconsistent object: in table \"campaigns\": in remote relationship\"targets\": remote schema with name \"API\" not found","name":"remote_relationship targets in table campaigns in source default","type":"remote_relationship"},{"definition":{"remote_field":{"getService":{"arguments":{"serviceId":"$service_id"}}},"name":"service","hasura_fields":["service_id"],"remote_schema":"API","source":"default","table":{"schema":"public","name":"campaign_templates"}},"reason":"Inconsistent object: in table \"campaign_templates\": in remote relationship\"service\": remote schema with name \"API\" not found","name":"remote_relationship service in table campaign_templates in source default","type":"remote_relationship"}]}}} docker | riot-hasura | {"type":"metadata","timestamp":"2021-08-02T16:52:28.097+0000","level":"warn","detail":{"message":"Inconsistent Metadata!","info":{"objects":[{"definition":{"definition":{"timeout_seconds":60,"url":"{{BASE_API_URL}}/graphql","forward_client_headers":true},"name":"API","permissions":[],"comment":null},"reason":"Inconsistent object: HTTP exception occurred while sending the request to http://host.docker.internal:3000/graphql","name":"remote_schema API","type":"remote_schema","message":{"message":"ConnectionFailure Network.Socket.connect: : does not exist (Connection refused)","request":{"proxy":null,"secure":false,"path":"/graphql","responseTimeout":"ResponseTimeoutMicro 60000000","method":"POST","host":"host.docker.internal","requestVersion":"HTTP/1.1","redirectCount":"10","port":"3000"}}},{"definition":{"remote_field":{"campaignTargetsUUID":{"arguments":{"campaignId":"$id"}}},"name":"targets","hasura_fields":["id"],"remote_schema":"API","source":"default","table":{"schema":"public","name":"campaigns"}},"reason":"Inconsistent object: in table \"campaigns\": in remote relationship\"targets\": remote schema with name \"API\" not found","name":"remote_relationship targets in table campaigns in source default","type":"remote_relationship"},{"definition":{"remote_field":{"getService":{"arguments":{"serviceId":"$service_id"}}},"name":"service","hasura_fields":["service_id"],"remote_schema":"API","source":"default","table":{"schema":"public","name":"campaign_templates"}},"reason":"Inconsistent object: in table \"campaign_templates\": in remote relationship\"service\": remote schema with name \"API\" not found","name":"remote_relationship service in table campaign_templates in source default","type":"remote_relationship"}]}}} docker | riot-hasura | {"type":"metadata","timestamp":"2021-08-02T16:52:29.100+0000","level":"warn","detail":{"message":"Inconsistent Metadata!","info":{"objects":[{"definition":{"definition":{"timeout_seconds":60,"url":"{{BASE_API_URL}}/graphql","forward_client_headers":true},"name":"API","permissions":[],"comment":null},"reason":"Inconsistent object: HTTP exception occurred while sending the request to http://host.docker.internal:3000/graphql","name":"remote_schema API","type":"remote_schema","message":{"message":"ConnectionFailure Network.Socket.connect: : does not exist (Connection refused)","request":{"proxy":null,"secure":false,"path":"/graphql","responseTimeout":"ResponseTimeoutMicro 60000000","method":"POST","host":"host.docker.internal","requestVersion":"HTTP/1.1","redirectCount":"10","port":"3000"}}},{"definition":{"remote_field":{"campaignTargetsUUID":{"arguments":{"campaignId":"$id"}}},"name":"targets","hasura_fields":["id"],"remote_schema":"API","source":"default","table":{"schema":"public","name":"campaigns"}},"reason":"Inconsistent object: in table \"campaigns\": in remote relationship\"targets\": remote schema with name \"API\" not found","name":"remote_relationship targets in table campaigns in source default","type":"remote_relationship"},{"definition":{"remote_field":{"getService":{"arguments":{"serviceId":"$service_id"}}},"name":"service","hasura_fields":["service_id"],"remote_schema":"API","source":"default","table":{"schema":"public","name":"campaign_templates"}},"reason":"Inconsistent object: in table \"campaign_templates\": in remote relationship\"service\": remote schema with name \"API\" not found","name":"remote_relationship service in table campaign_templates in source default","type":"remote_relationship"}]}}} ```

We can see that the container is not exiting anymore when encountering this error. And the worse part is that even when the API is finally up and running, hasura is stuck with this error forever. I have to manually restart the hasura container after the API is available, and then it runs fine.

jp-ryuji commented 3 years ago

We encountered the same/similar issue with remote schema when having tried to bump up hasura graphql engine (HGE) to v2.0.6 from v1.3.3.

Expected (how hasura/graphql-engine v1.3.3 works)

HGE connects remote schema even when remote schema starts listening after HGE has started.

Actual (how hasura/graphql-engine v2.0.6 works)

HGE does NOT connect remote schema when remote schema starts listening after HGE has started. It says Inconsistent Metadata!. The following is a full error log.

{"type":"metadata","timestamp":"2021-08-13T13:48:46.153+0000","level":"warn","detail":{"message":"Inconsistent Metadata!","info":{"objects":[{"definition":{"definition":{"timeout_seconds":60,"url_from_env":"HASURA_GRAPHQL_REMOTE_SCHEMA_TO_API","forward_client_headers":true},"name":"api","permissions":[],"comment":""},"reason":"Inconsistent object: HTTP exception occurred while sending the request to http://host.docker.internal:3000/graphql","name":"remote_schema api","type":"remote_schema","message":{"message":"ConnectionFailure Network.Socket.connect: <socket: 24>: does not exist (Connection refused)","request":{"proxy":null,"secure":false,"path":"/graphql","responseTimeout":"ResponseTimeoutMicro 60000000","method":"POST","host":"host.docker.internal","requestVersion":"HTTP/1.1","redirectCount":"10","port":"3000"}}}]}}}

Environments

hasura/graphql-engine: v2.0.6
hasura-cli: 2.0.5 (this is not relevant directly though)
The behavior doesn't change with or without the config structure change by hasura scripts update-project-v3.

Note

hasura console is not available.
hasura metadata reload and an equivalent API call can solve the issue.
- This can be a temporary solution, but we hope it is fixed in the near future.

AesSedai commented 3 years ago

The way I've handled this is via a service health check in docker compose. This way, the db service and server service must be healthy before Hasura can start. Example compose file:

services:
    db:
        image: postgres:13.2-alpine
        environment:
            POSTGRES_USER: ${POSTGRES_USER}
            POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
            POSTGRES_DB: ${POSTGRES_DB}
        ports:
            - ${POSTGRES_PORT_HOST}:${POSTGRES_PORT_CONTAINER}
        volumes:
            - db:/var/lib/postgres/data
        healthcheck:
            test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
            interval: 5s
            timeout: 5s
            retries: 5
        networks:
            - app

    server:
        build: ./server
        environment:
            PORT: 4000
            HASURA_GRAPHQL_URL: ${HASURA_GRAPHQL_URL}
            HASURA_GRAPHQL_ADMIN_SECRET: ${HASURA_GRAPHQL_ADMIN_SECRET}
        volumes:
            - ./server:/usr/src/app
        ports:
            - ${SERVER_PORT_HOST}:${SERVER_PORT_CONTAINER}
        healthcheck:
            test: ["CMD-SHELL", "netstat -tulnp | grep 4000"]
            interval: 10s
            timeout: 5s
            retries: 5
        networks:
            - app

    graphql-engine:
        image: hasura/graphql-engine:v2.0.8.cli-migrations-v3
        ports:
            - ${HASURA_PORT_HOST}:${HASURA_PORT_CONTAINER}
        depends_on:
            db:
                condition: service_healthy
            server:
                condition: service_healthy
        restart: always
        environment:
            HASURA_GRAPHQL_LOG_LEVEL: warn
            HASURA_GRAPHQL_DATABASE_URL: ${HASURA_GRAPHQL_DATABASE_URL}
            HASURA_GRAPHQL_UNAUTHORIZED_ROLE: ${HASURA_GRAPHQL_UNAUTHORIZED_ROLE}
            HASURA_GRAPHQL_ENABLE_REMOTE_SCHEMA_PERMISSIONS: "true"
            HASURA_GRAPHQL_ADMIN_SECRET: ${HASURA_GRAPHQL_ADMIN_SECRET}
            HASURA_GRAPHQL_JWT_SECRET: ${HASURA_GRAPHQL_JWT_SECRET}
        networks:
            - app

networks:
    app:
        driver: bridge

volumes:
    db:
        external: true

pylebecq commented 3 years ago

@AesSedai Thank you for sharing your solution. I used https://github.com/roerohan/wait-for-it to achieve a similar result, but I had to change the entrypoints. I will probably try to use the health checks instead, I find it cleaner.

pylebecq commented 3 months ago

Closing this, as the health check solution proposed by @AesSedai works for my use-case. Thanks!

hasura / graphql-engine