Custom scalars as extensions

jasonkuhrt commented 2 weeks ago

Perceived Problem

Currently custom scalars are supported but complicated by an interface that spans gentime and runtime.
- Recall that user must export custom scalar codecs from a TypeScript module which is then found by the generator when it runs. The generator then imports those custom scalars, derives their types, and based on name matching rules, associates imported codecs with custom scalars in the schema (that is: the user must export codecs that are precisely named exactly as they are in the GraphQL schema).

Ideas / Proposed Solution(s)

Can we simplify this to:

import { Date } from '@graffle/extension-scalars'

Pokemon
  .create()
  .use(Date())

We need to consider both runtime and buildtime (types).

Runtime

What do we need to do?

Every argument in the GraphQL request to be sent that is a custom scalar needs to be encoded.
Every data field in the GraphQL result that is a custom scalar needs to be decoded.

How do we encode arguments?

Extensions could be allowed to add scalar codecs. Codecs are already a Graffle data type and we could basically use them here as they are currently defined. They are a pair of functions, encode and decode.

When an extension would add a codec, the codec's encode function would be run on every argument of that type. ~To discover which arguments are of that type, the encode step would recursively iterate over the selection set searching for where arguments are used. As soon as a usesite for an argument is found, the search for that argument can end because we can assume that~ To discover which arguments are of that type, if we are dealing with a graphql document object, then we need only look at the defined arguments of the document which will all already be typed. We literally just string compare the type name of the argument in the document with the type name of the custom scalar. If no arguments use the custom scalar then no action is needed. If there is an argument that is a custom scalar (defined as having a type that does not match any of the standard scalar names) but has no codec available for it we can either throw an error or passthrough the value.

The simplicity of the above is predicated on there being a process by which we transform interface-typed inputs into graphql document objects. This relates to a realize in the opening message of https://github.com/jasonkuhrt/graffle/issues/1149.

If we do not do that work, then, we have a harder time of encoding here wherein we need to deeply traverse the selection set until we discover where each argument was used in it, then refer to the schema index for what type is at that point.

How do we decode data?

For decoding, we need to recursively traverse the result set, applying the codec decode function to every value whose type is of that codecs. Discovering a value's type in the result set requires a schema map that is co-traversed to make type checks. But this gets more complicated with aliases which allow the result set to diverge from the schema and thus another co-traversal must be added, the selection set.

There are likely many many ways to speed this up, here are some ideas:

~During encoding, note if there were any custom scalar arguments used. If none, we don't need to decode at all.~ This is wrong: lack of argument custom scalars does not mean lack of output custom scalars.
During encoding, note if there were any aliases used. If none, we don't need to co-traverse the selection set during decoding. However, this requires the encoding phase to now recursively analyze the document bringing its own cost.
During encoding, take note of all the paths to custom scalars. Like above, this requires a recursive analysis of the document. With this lookup, we can easily decode by iterating only over the known locations of custom scalars. Aliases would inherently be handled by this too.
During schema map generation, output a data structure that contains an index of custom scalar output and input locations. Then, when applying codecs, drive their application by this index, traversing only the minimum of result sets. This idea seems promising for time efficiency but would require more space (memory). The more a custom scalar is used in a schema, the higher the memory cost of this index.
- Note on its own, this doesn't free us from the problem of aliases.

Looking at the above, I think this is the optimal approach for us:

Generate a custom scalar schema index. For each custom scalar track all input and output paths to it. As well, starting from root types, paths to all output fields using a custom scalar.
During encoding, encode the arguments of course (this is trivial), but also build a decode outputs map that accounts for any aliases used. Optimize the traversal using the generated index described in step (1). This means as we traverse the selection set we short-circuit paths as soon as we see they no longer could have any custom scalars. Take note of the paths we traverse, and as we hit an alias take note of the alias name used. At the end, any paths that were not short-circuited are our custom scalar decode index. Note that it is expected for the result set to be a subset of this since things like @include directive could mean what appears in the selection set never does in the result set.

Extension Interface

const extension = createExtension('ScalarDate', {
  scalars: [{
    name: 'Date',
    encode: (value:Date) => Date.getTime()
    decode: (value:number) => new Date(value)
  }]
})

Buildtime

~The scope of buildtime diverges with runtime. While we could have the extension return static types of any provided scalar codecs there would be no way for the generated typed to be augmented by that.~ Some generated types have type parameters that are filled by the instance using the HKT-ish trick. We could expand use of this trick such that the generated select types also take on a type parameter that is passed a config type whose type in turn is subject to the extensions the client has used.

jasonkuhrt commented 2 weeks ago

I did some exploration to get a sense of the AST utilities and types from the graphql package. It is straight forward. There are TypeScript AST interface types. No functional utilities. I would encode the typed-interface input to these objects. Then use the GraphQL native utilities to turn this into GrahQL syntax.

/* eslint-disable */

import {
  type DocumentNode,
  type FieldNode,
  GraphQLInt,
  Kind,
  type NameNode,
  type OperationDefinitionNode,
  OperationTypeNode,
  parse,
  print,
  type SelectionSetNode,
} from 'graphql'

const d = console.log
// d(GraphQLInt)

// Exploring traversing the document AST.

const ss1 = parse(`query { foo(int:5) }`)
const d1 = ss1.definitions[0]!

// if (d1.kind === Kind.OPERATION_DEFINITION) {
//   const ss1 = d1.selectionSet.selections[0]!
//   if (ss1.kind === Kind.FIELD) {
//     d(ss1.name.value)
//     d(ss1.arguments)
//   }
// }

// // Exploring building up a document AST.

// // Create a name node for the field
// const fieldNameNode: NameNode = {
//   kind: Kind.NAME,
//   value: 'bar',
// }

// // Create a name node for the alias
// const aliasNameNode: NameNode = {
//   kind: Kind.NAME,
//   value: 'foo',
// }

// // Create a field node
// const fieldNode: FieldNode = {
//   kind: Kind.FIELD,
//   name: fieldNameNode,
//   alias: aliasNameNode,
// }

// // Create a selection set node
// const selectionSetNode: SelectionSetNode = {
//   kind: Kind.SELECTION_SET,
//   selections: [fieldNode],
// }

// // Create an operation definition node
// const operationDefinitionNode: OperationDefinitionNode = {
//   kind: Kind.OPERATION_DEFINITION,
//   operation: OperationTypeNode.QUERY,
//   selectionSet: selectionSetNode,
// }

// // Create the document node
// const documentNode: DocumentNode = {
//   kind: Kind.DOCUMENT,
//   definitions: [operationDefinitionNode],
// }

// // Log the constructed document
// d(documentNode)

// // Print the query string
// d(print(documentNode))

Our encoder would:

Create document AST nodes
Take note of custom scalars found in the selection set (as described in opening message)
Encode arguments

There is an alternative path for 3 which we have yet not done but is as follows.

Instead of encoding and then inlining arguments, raise them up to be variables in the operation, and make the arguments being inputs to those operation variables.

This can be a future feature. Deferring it does not appear to lead to any wasted effort now.

jasonkuhrt commented 2 weeks ago

I sketched out the various flows that would be present and some of the high level costs/optimizes.

CleanShot 2024-09-29 at 17 16 28@2x

jasonkuhrt commented 4 hours ago

GraphqL custom scalars are a core concept.

We can do a few things in core:

In the fluent state we can track custom scalar registrations (like we do for extensions and configuration settings): graffle.scalar(Date).
That state is tracked statically and at runtime to provide the expected static typing on results but also the expected encoding and decoding at runtime.
Given this core functionality it then is entirely possible for an extension to wrap it up to support something like graffle.use(CustomScalars.Date()) though I suspect there would be little need for that.

jasonkuhrt commented 2 hours ago

I began to implement this and had some thoughts.

Currently we have static APIs for static selection sets and runtime selection sets. Static meaning they can be imported and used directly. This even works with custom scalars because their runtime (and inferred types) are available statically.
A downside of the approach is that the user must define a scalars.ts module that exports those custom scalars. This is out of band from e.g. graffle.use(...).
A general upside is having access to the custom scalars at gen time permits APIs like static selection set helpers (runtime/gentime). Point here what about ideas yet to come that would further leverage this? Also note how gentime allows namespace type APIs that are not possible from inference. E.g. Query.Foo.bar<...> is a parameterized type which is not something that can be inferred.
Thinking about what is overall simpler. If user has to use gentime config for anything else than custom scalars anyways then the cost of them needing out of band config is somewhat reduced.
It would be great if we could figure out how to get the custom scalar configuration as something integrated into the gentime configure. However I don't see how this will be advisable because those custom scalars end up in the bundle and we don't want the gentime module code to be in that bundle.
Another approach could be to have custom scalars defined in two places: via runtime extensions and via gentime extensions: runtime for runtime and inferred inputs/outputs and gentime for static typed APIs of selection sets. This non-DRY however seems very confusing.
What if there were a way for for graffle.use() but also a way for those custom scalar types to be picked up for use in static APIs? The user could create a scalars.ts module and import from there into graffle.scalar(Date).scalar(Double).scalar(...) while telling Graffle generator about it if they wish, giving the benefit when they do of the static imports APIs. ... Further, the generator could, in the pre-filled client, already have applied basically .scalar(...) calls. Further, with regards to imports, generator could detect dependencies like graffle/scalar-x and automatically use them. User could opt-out of that in generator config. User in generator config could also instruct other auto-uses.

The point here is about exposing a real API for custom scalars that the generator the builds on top of. For the generator to benefit statically it would need to be given a reference to a scalars module.

If a user would not inform the generator of those custom scalar types statically then the user would get string typing for custom scalars in those static APIs.

graffle-js / graffle