mercurius-js / mercurius-gateway

Mercurius federation support plugin
MIT License
17 stars 11 forks source link

Load balancer must check if the server in the provided pool is online before sending a request #64

Open SiNONiMiTY opened 1 year ago

SiNONiMiTY commented 1 year ago

Title says

I am encountering a scenario where I provide 2 URLs for a single subgraph in an array form

const gateway = Fastify()
gateway.register(mercuriusGateway, {
    gateway: {
        services: [
            {
                "name": "user",
                "url": [
                        "http://endpoint1:4001/graphql",
                        "http://endpoint2:4001/graphql"
                ],
                "schema": "type Query { id: ID }"
            }
        ]
    }
})

endpoint2 is intentionally taken down and only endpoint1 is working, however, when sending queries on the gateway, I am occassionally receiving errors about ECONNREFUSED on endpoint2.

The load balancing mechanism should first do a test ping if the host is reachable before sending a request.

mcollina commented 1 year ago

Unfortunately it's a bit more complex than sending a "ping", as those errors come from existing sockets that are truncated.

How are you shutting down your upstreams servers? Are they closing gracefully or are they crashing?

SiNONiMiTY commented 1 year ago

Unfortunately it's a bit more complex than sending a "ping", as those errors come from existing sockets that are truncated.

How are you shutting down your upstreams servers? Are they closing gracefully or are they crashing?

Starting the gateway with only one online subgraph out of the two provided

mcollina commented 1 year ago

Thanks, that helps!

I think there is a bug in undici BalancedPool that routes requests to an upstream even if it could not connect there, and it does not retry/send it elsewhere in case it fails to connect. Things stabilizes over time because of BalancedPool algorithm, so only a few number of requests would fail.

The bad news is that I don't have time right now to fix it there.

SiNONiMiTY commented 1 year ago

Thanks, that helps!

I think there is a bug in undici BalancedPool that routes requests to an upstream even if it could not connect there, and it does not retry/send it elsewhere in case it fails to connect. Things stabilizes over time because of BalancedPool algorithm, so only a few number of requests would fail.

The bad news is that I don't have time right now to fix it there.

Yes! I noticed that the balancing algorithm eventually only selects the online server after sending some requests.