Planned module: @dataplan/graphql

brandonwestcott commented 6 months ago

Summary

Grafast Question: What is the best way to get or assemble the accessed fields from the finalized plan?

HI @benjie, awesome work here! I'm excited for the future of grafast, its a novel solution to the continual pains of graphql resolution.

Example Case Question

Our team is writing a small service-specific graphql api wrapping our larger core graphql api. The schema will largely be a subset of the core schema, but with many fields filtered out and others added for denormalization / business logic. We've explored delegation, federation, and stitching, but all those seem rather bloated for our use case.

I was wondering if you have suggestions on using grafast steps to build up a dynamic query to another graphql server. I suspect this is theoretically very similar to pg planning.

Here is a somewhat contrived example schema for the service-specific api using grafast:

const typeDefs = /* GraphQL */ `
  type Query {
    order(id: Int): Order
  }

  type Order {
    id: Int
    name: String
    customer: Customer
  }

  type Customer {
    id: Int
    name: String
    displayName: String
  }
`

const plans: GrafastPlans = {
  Order: {
    id: ($order: ObjectLikeStep) => {
      return $order.get('id')
    },
    name: ($order: ObjectLikeStep) => {
      return lambda([$order.get('id')], ([id]) => {
        return `Order #${id}`
      })
    },
    customer($order: ObjectLikeStep) {
      return $order.get('customer')
    },
  },
  Customer: {
    id: ($customer: ObjectLikeStep) => {
      return $customer.get('id')
    },
    name: ($customer: ObjectLikeStep) => {
      return $customer.get('name')
    },
    displayName: ($customer: ObjectLikeStep) => {
      const $name = $customer.get('name')
      const $region = $customer.get('region')
      return lambda([$name, $region], ([name, region]) => {
        return `${name} (${region})`
      })
    },
  },
  Query: {
    order(_, fieldArgs, _builder) {
      const $id = fieldArgs.get('id')
      return loadOne($id, async (specs, options) => {
        const someSelectFromAccessedFieldsInDag = {
          // How to fetch full select here?
        }
        const query = jsonToGraphQLQuery({
          query: {
            order: {
              __args: {
                id: $id,
              },
              ...someSelectFromAccessedFieldsInDag,
            },
          },
        })
        return client.execute(query)
      })
    },
  },
}

And given the following example query to the grafast api:

query order($id: Int) {
  order(id: $id) {
    name
    customer {
      name
      displayName
    }
  }
}

That query assembles a plan with $order.id, $order.customer, $customer.name and $customer.region.

Given that, the desired query to the core graphql api would be:

query order($id: Int) {
  order(id: $id) {
    id
    customer {
      name
      region
    }
  }
}

What would be the best way to assemble the full set of fields accessed from the Query.order step? Can that be done by walking the plan itself, or do we need to use a custom step class to track and assemble the core api query?

Thanks!

benjie commented 6 months ago

Hi @brandonwestcott; creating a @dataplan/graphql for this exact purpose is on my long term plan! Would be great for you to start this early.

You should use step classes to track which attributes are accessed. You'll end up with a lot of steps during planning, this is expected - as one of the final steps they can be optimized away. You might want to look at the broad idea (not the specifics, they're way way way too complicated for this use case!) of how the Postgres integration works; broadly:

PgSelectStep represents a select ... from ... operation
PgSelectSingleStep represents a single row from the result
PgClassExpressionStep represents an expression on a single row, for example access to a particular column, or a Postgres function call or similar

Critically all these exist before optimization, but when optimization happens many of them are inlined into their parents/etc and it all collapses down to a much simpler plan.

You might want to look at how the loadOne() step works too, since that tracks access to .get(field) so it knows which fields to request. .get(key) is becoming a convention, and will probably be more explicitly a protected pattern in a future version (e.g. "everything that has a "get" method must obey X, Y and Z").

There are some subtleties to GraphQL in this regard; for example in Postgres every PgSelectStep can be an SQL statement on its own, or can be inlined into a parent SQL statement, but that's not true in GraphQL - only root level fields can be queried on the root, all others maintain their heirarchy.

Further, when polymorphism comes into it you may need to query additional attributes that won't actually be returned in the result. This probably isn't an issue, but does mean that you might need to do careful management of aliases in outgoing queries to prevent users shaping incoming queries maliciously causing you issues (e.g. fragment Foo on User { __typename: username } might trick your system if you're not careful!).

I think you would want:

A step class representing a GraphQL operation
A step class representing a GraphQL field
A step class representing a GraphQL inline fragment
A step class representing a GraphQL named fragment

Deduplication of each of these should follow effectively the field collection rules in the GraphQL spec (if a field is specified twice with two different sets of arguments (and the same response key) then that would be forbidden by the GraphQL spec's validation, but what we'd do would be to just assign two different aliases to it when we request it from the remote server).

Try and think very locally in steps, e.g. each layer is just representing a single selection on the parent selection set, or similar. Then your big complex system can be composed of all these small isolated features that are easy (ish) to reason about.

I hope this helps! I'd love to track your progress on this.

benjie commented 6 months ago

Ooo, I thought this rung a bell! Turns out that I started work on this back September last year, on the plane over to GraphQLConf! You can actually see my timezone change in the commit timestamps :D

The code in this branch is super out of date, a lot of the patterns have changed now, but perhaps it'll give some inspiration? Also looks like I stopped halfway through some refactoring or something, so I've just added that just now... no idea if it's coherent.

https://github.com/graphile/crystal/tree/dataplan-graphql

brandonwestcott commented 6 months ago

Awesome @benjie, thanks for the detailed response! Excited to dive in an explore this more

PS funny enough, my exploration of grafast was actually on a plane back from a conference

benjie commented 6 months ago

Keeping this open to track @dataplan/graphql. Let me know if you need any input on it (you may get faster responses on Discord; make sure you give yourself the Dev role).

graphile / crystal

Planned module: @dataplan/graphql #2037

Summary

Example Case Question