hasura / ndc-graphql

NDC for GraphQL
Apache License 2.0
5 stars 1 forks source link

Solve N+1 issue for remote relationships to connector #7

Open sordina opened 1 month ago

sordina commented 1 month ago

When a remote relationship to this connector the classic N+1 issue arrises where N queries need to be performed for N records in a relationship.

For example if you distributed the chinook dataset across two postgres instances with one fronted by the PG connector and one fronted by a HasuraV2 instance and the GraphQL connector:

Then querying across the relationship like so, if there were 100 albums:

query {
    sg1_albums {
        title
        artist {
            name
        }
    }
}

would query the graphQL connector 100 times.

The solution to this issue for other connectors has been to implement the ForEach capability which then allows a query against the full set of records in the relationship at once, deferring the choice of what to do to the connector.

The current implementation of the GraphQL Connector does not support the ForEach capability and associated variables. However we can't simply add this and implement an optimised query because unlike SQL databases, upstream GraphQL schemas don't share singular and plural records behind the same interface (tables), and don't indicate relationships involved between fields.

In this example, imagine that the NDC query was sent to the connector:

{
    "query": {
         ... "table": "artist",
         ... "fields": ["name"]
    },
    "variables": [{
        "artist_id": 1,
        ...
    }]
}

We would hope that the upstream query would be something like:

query {
    artists(where: {artist_id: {in: [1,...]}}) {
        artist_id,
        name
    }
}

but how does the connector know that artists corresponds to artist and that the artist_id variable set should be put into the where: {artist_id: {in: filter.

In addition to this, once the query is performed it has to be decomposed into a resultset rather than just forward the results of the naive query.

So,

There are many options for this, however my preference is:

See Also

BenoitRanque commented 1 month ago

Compromise proposal:

We implement the foreach capability for queries.

We generate queries that look like this:

query (n1_where: table_bool_expr, n2_where: table_bool_expr) {
  n1: table(where: $n1_where) { foo bar }
  n2: table(where: $n2_where) { foo bar }
}

Where each variable set gets a root field. This reduces network calls from engine -> connector and connector -> underlying GraphQL server from N to 1. The underlying GraphQL server is free to optimize from there, although most (including HGE v2) probably won't besides parallelizing execution.

I believe this is how v2 works for remote schemas.


The above proposal is admittedly a compromise, but GraphQL is poorly suited to solve this issue. V2 hasura does expose an api with collections where you could build giant where clauses and then split up the results, but that means we would have to

This doesn't get into the specifics of relationships and other issues.

To me this feels like a lot of added complexity for minimal benefit, plus the added cost of making the entire thing more brittle. IMHO we should favor a robust compromise solution. Not perfect but good enough.

kenstott commented 1 month ago

@sordina I'm not sure I understand the proposal. But I'll dive in anyway.

If, by custom directive, you mean to require the remote GraphQL schema to provide details on the remote relationship capability, then I think that's not a strong idea. Adding constraints or requirements on the remote GraphQL endpoint would reduce adoption. (Ignore if I misunderstood the proposal.) In addition, It might be better to make it a more general solution that could work across REST/OpenAPI, GraphQL, gRPC, etc.

Adding the details in the connector config (which defines the pattern for the in: style predicate or an equivalent API endpoint) seems like a better answer, which you could then adapt to any API style.

Meaning...

but how does the connector know that artists corresponds to artist and that the artist_id variable set should be put into the where: {artist_id: {in: filter.}}

The pattern required to do that has to be defined in the config, and it would be in some new, expanded properties in the existing remote relationship definition.