[RFC] Implementation of June 2023 incremental delivery format with `@defer`

benjie commented 9 months ago

This RFC introduces an alternative solution to incremental delivery, implementing the June 2023 response format.

This solution aims to minimize changes to the existing execution algorithm, when comparing you should compare against benjie/incremental-common (https://github.com/graphql/graphql-spec/pull/1039) to make the diff easier to understand. I've raised this PR against that branch to make it clearer.

The RFC aims to avoid mutations and side effects across algorithms, so as to fit with the existing patterns in the GraphQL spec. It also aims to leverage the features we already have in the spec to minimize the introduction of new concepts.

WORK IN PROGRESS: there's likely mistakes all over this currently; and a lot will need to be done to maintain consistency of the prose and algorithms.

This RFC works by adjusting the execution algorithms in a few small ways:

It introduces the concept of "delivery groups".
Previously GraphQL can be thought of has having just a single delivery group (called the "root delivery group" in this RFC) - everything was delivered at once. With "incremental delivery", we're deliverying the data in multiple phases, or groups. A "delivery group" keeps track of which fields belong to which @defer, such that we can complete one delivery group before moving on to its children.
CollectFields() now returns a map of "field digests" rather than just fields.
CollectFields() used to generate a map between response key and field selection (Record<string, FieldNode>), but now it creates a map between response key and a "field digest", an object which contains both the field selection and the delivery group to which it belongs (Record<string, { field: FieldNode, deliveryGroup: DeliveryGroup }>). As such, CollectFields() is now passed the current path and delivery group as arguments.
ExecuteRootSelectionSet() may return an "incremental event stream".
If there's no @defer then ExecuteRootSelectionSet() will return data/errors as before. However, if there are active @defers then it will instead return an event stream which will consist of multiple incremental delivery payloads.
ExecuteGroupedFieldSet() runs against a set of "current delivery groups".
If multiple sibling delivery groups overlap, the algorithm will first run the fields common to all the overlapping delivery groups, and only when these are complete will it execute the remaining fields in each delivery group (in parallel). This might happen over multiple layers. This is tracked via a set of "current delivery groups", and only fields which exist in all of these current delivery groups will be executed by ExecuteGroupedFieldSet().
ExecuteGroupedFieldSet() returns the currently executed data, as before, plus details of incremental fields yet to be delivered.
When there exists fields not executed in ExecuteGroupedFieldSet() (because they aren't in every one of the "current delivery groups"), we store "incremental details" of the current grouped field set (by its path), for later execution. The incremental details consists of:
- objectType - the type of the concrete object the field exists on (i.e. the object type passed to ExecuteGroupedFieldSet())
- objectValue - the value of this object (as would be passed as the first argument to the resolver for the field)
- groupedFieldSet - similar to the result of CollectFields(), but only containing the response keys that have not yet been executed
CompleteValue() continues execution in the "current delivery groups".
We must pass the path and current delivery groups so that we can execute the current delivery groups recursively.
CompleteValue() returns the field data, as before, plus details of incremental subfields yet to be delivered.
As with ExecuteGroupedFieldSet(), CompleteValue() must pass down details of any incremental subfields that need to be executed later.

At a @defer boundary, a new DeliveryGroup is created, and field collection then happens within this new delivery group. This can happen multiple times in the same level, for example:

{
  # root delivery group
  currentUser {
    name
  }
  ... @defer {
    # Child delivery group
    expensiveField {
      id
    }
    ... @defer {
      # Grandchild delivery group
      veryExpensiveField {
        title
      }
    }
  }
}

If no @defer exists then no new delivery groups are created, and thus the request executes as it would have done previously. However, if there is at least one active @defer then the client will be sent the initial response along with a list of pending delivery groups. We will then commence executing the delivery groups, delivering them as they are ready.

Note: when an error occurs in a non-null field, the incremental details gathered in that selection set will be blown up alongside the sibling fields - we use the existing error handling mechanisms for this.

This PR is nowhere near complete. I've spent 2 days on this latest iteration (coming up with the new stream and partition approach as the major breakthrough) but I've had to stop and I'm not sure if I've left gaps. Further, I need to integrate Rob's hard work in #742 into it.

To make life a bit easier on myself, I've written some TypeScript-style declarations of the various algorithms used in execute, according to this RFC. This may not be correct and is definitely non-normative, but might be useful to ease understanding.

type RawVariables = { [variableName: string]: any };
type CoercedVariables = { [variableName: string]: any };

function ExecuteRequest(
  schema: GraphQLSchema,
  document: Document,
  operationName: string | null,
  variableValues: RawVariables,
  initialValue: any
):
  | ReturnType<typeof ExecuteQuery>
  | ReturnType<typeof ExecuteMutation>
  | ReturnType<typeof Subscribe>;

function GetOperation(
  document: Document,
  operationName: string | null
): Operation;

function CoerceVariableValues(
  schema: GraphQLSchema,
  operation: Operation,
  variableValues: RawVariables
): CoercedVariables;

function ExecuteQuery(
  query: Document,
  schema: GraphQLSchema,
  variableValues: CoercedVariables,
  initialValue: any
): ReturnType<typeof ExecuteRootSelectionSet>;

function ExecuteMutation(
  mutation: Document,
  schema: GraphQLSchema,
  variableValues: CoercedVariables,
  initialValue: any
): ReturnType<typeof ExecuteRootSelectionSet>;

function Subscribe(
  subscription: Document,
  schema: GraphQLSchema,
  variableValues: CoercedVariables,
  initialValue: any
): ReturnType<typeof MapSourceToResponseEvent>;

function CreateSourceEventStream(
  subscription: Document,
  schema: GraphQLSchema,
  variableValues: CoercedVariables,
  initialValue: any
): ReturnType<typeof ResolveFieldEventStream>;

function ResolveFieldEventStream(
  subscriptionType: GraphQLObjectType,
  rootValue: any,
  fieldName: string,
  argumentValues: { [argumentName: string]: any }
): EventStream<any>;

interface ExecutionResult {
  data?: any;
  errors?: GraphQLError[];
}

type Path = Array<string | number>;

interface Pending {
  id: number;
  path: Path;
  label?: string;
}

interface InitialIncrementalResult {
  data: Record<string, any>;
  errors?: GraphQLError[];
  hasNext: true;
  pending: Pending[];
}

interface Completed {
  id: number;
  // Errors that bubbled to the root of the defer/stream
  errors?: GraphQLError[];
}

type IncrementalPayload = DeferredPayload | StreamedPayload;

interface DeferredPayload {
  id: number;
  subpath?: Path;
  data: Record<string, any>;
  // Errors that happened _successfully_ (i.e. did not bubble up to the @defer)
  errors?: GraphQLError[];
}

interface StreamedPayload {
  id: number;
  items: Array<any | null>;
  // Errors that happened _successfully_ (i.e. did not invalidate the stream)
  errors?: GraphQLError[];
  data: any;
}

interface SubsequentIncrementalResult {
  hasNext: boolean;
  pending?: Pending[];
  completed?: Completed[];
  incremental?: IncrementalPayload[];
}

function MapSourceToResponseEvent(
  sourceStream: EventStream<any>,
  subscription: Document,
  schema: GraphQLSchema,
  variableValues: CoercedVariables
): EventStream<
  ExecutionResult | InitialIncrementalResult | SubsequentIncrementalResult
>;

function ExecuteSubscriptionEvent(
  subscription: Document,
  schema: GraphQLSchema,
  variableValues: CoercedVariables,
  initialValue: any
): ExecutionResult | ReturnType<typeof IncrementalEventStream>;

function Unsubscribe(responseStream: EventStream<any>): void;

function ExecuteRootSelectionSet(
  variableValues: CoercedVariables,
  initialValue: any,
  objectType: GraphQLObjectType,
  selectionSet: SelectionSet,
  serial: boolean = false
): ExecutionResult | ReturnType<typeof IncrementalEventStream>;

interface DeliveryGroup {
  id: number;
  path: Path;
  parent: DeliveryGroup | null;
  label?: string;
}

interface IncrementalDetails {
  groupedFieldSet: FieldSet;
  objectType: GraphQLObjectType;
  objectValue: any;
}

type IncrementalDetailsByPath = { [path: Path]: IncrementalDetails };

function ExecuteGroupedFieldSet(
  groupedFieldSet: GroupedFieldSet,
  objectType: GraphQLObjectType,
  objectValue: any,
  variableValues: CoercedVariables,
  path: Path,
  currentDeliveryGroups: DeliveryGroup[]
): [
  resultMap: Record<string, any>,
  incrementalDetailsByPath: IncrementalDetailsByPath
];

interface FieldDigest {
  selection: FieldNode;
  deliveryGroup: DeliveryGroup;
}

function CollectFields(
  objectType: GraphQLObjectType,
  selectionSet: SelectionSet,
  variableValues: CoercedVariables,
  path: Path,
  deliveryGroup: DeliveryGroup,
  visitedFragments: Set<string> = new Set()
): { [responseKey: string]: FieldDigest[] };

function DoesFragmentTypeApply(
  objectType: GraphQLObjectType,
  fragmentType: GraphQLObjectType | GraphQLInterfaceType | GraphQLUnionType
): boolean;

function ExecuteField(
  objectType: GraphQLObjectType,
  objectValue: any,
  fieldType: GraphQLType,
  fieldDigests: FieldDigest[], // All for this field
  variableValues: CoercedVariables,
  path: Path,
  currentDeliveryGroups: DeliveryGroup[]
): ReturnType<typeof CompleteValue>;

function CompleteValue(
  fieldType: GraphQLType,
  fieldDigests: FieldDigest[], // All for this field
  result: any,
  variableValues: CoercedVariables,
  path: Path,
  currentDeliveryGroups: DeliveryGroup[]
): [
  result:
    | null
    | ScalarValue
    | EnumValue
    | Record<string, any>
    | Array<null | ScalarValue | EnumValue | Record<string, any>>,
  incrementalDetailsByPath: IncrementalDetailsByPath
];

function CoerceResult(
  leafType: GraphQLScalarType | GraphQLEnumType,
  value: any
): any;

function ResolveAbstractType(
  abstractType: GraphQLInterfaceType | GraphQLUnionType,
  objectValue: any
): GraphQLObjectType;

function CollectSubfields(
  objectType: GraphQLObjectType,
  fieldDigests: FieldDigest[],
  variableValues: CoercedVariables,
  path: Path
): { [responseKey: string]: FieldDigest[] };

function IncrementalEventStream(
  data: Record<string, any>,
  errors: GraphQLError[] | undefined,
  initialIncrementalDetailsByPath: IncrementalDetailsByPath,
  variableValues: CoercedVariables
): EventStream<InitialIncrementalResult | SubsequentIncrementalResult>;

function CollectDeliveryGroups(
  incrementalDetailsByPath: IncrementalDetailsByPath,
  excludingDeliveryGroups: Set<DeliveryGroup> = new Set()
): DeliveryGroup[];

function MakePending(deliveryGroups: DeliveryGroup[]): Pending[];

function IncrementalStreams(
  incrementalDetailsByPath: IncrementalDetailsByPath
): EventStream<SubsequentIncrementalResult>;

function PartitionDeliveryGroupsSets(
  incrementalDetailsByPath: IncrementalDetailsByPath
): Array<Set<DeliveryGroup>>;

function IncrementalStream(
  incrementalDetailsByPath: IncrementalDetailsByPath,
  deliveryGroupsSet: Set<DeliveryGroup>
): EventStream<SubsequentIncrementalResult>;

function SplitRunnable(
  incrementalDetailsByPath: IncrementalDetailsByPath,
  runnableDeliveryGroupsSet: Set<DeliveryGroup>
): [
  remainingIncrementalDetailsByPath: IncrementalDetailsByPath,
  runnable: IncrementalDetailsByPath
];

function MergeIncrementalDetailsByPath(
  incrementalDetailsByPath1: IncrementalDetailsByPath,
  incrementalDetailsByPath2: IncrementalDetailsByPath
): IncrementalDetailsByPath;

yaacovCR commented 8 months ago

@benjie

Just checking — does this algorithm handles the test case in https://github.com/graphql/graphql-js/pull/3997 correctly?

Can inclusion of a field in a nested deferred fragment — where that field is present in a parent result and so will never be delivered with the child — muck with how the delivery groups are created?

benjie commented 6 months ago

Can inclusion of a field in a nested deferred fragment — where that field is present in a parent result and so will never be delivered with the child — muck with how the delivery groups are created?

It shouldn't cause an issue because it's based on field collection, so both of the shouldBeWithNameDespiteAdditionalDefer will be grouped together at the same time (with different "defer paths") - they aren't treated as separate fields - we visit deferred and non-deferred fields alike in the same selection set, and then partition their execution based on defers.

(Note this may not actually be the case in the current algorithm because it may have bugs, but this is the intent.)

query HeroNameQuery {
  ... @defer {
    hero {
      id
    }
  }
  ... @defer {
    hero {
      name
      shouldBeWithNameDespiteAdditionalDefer: name
      ... @defer {
        shouldBeWithNameDespiteAdditionalDefer: name
      }
    }
  }
}

First group does nothing, but notes that hero exists and is deferred (twice).

Next is creates two new groups for the defers, and a "shared" group. The shared group executes the hero field, and then the subfields are executed in the two separate groups afterwards.

When grouping the subfields on the second of these groups it's noted that shouldBeWithNameDespiteAdditionalDefer exists twice (but these two usages are collected together) and the simpler "defer" wins, such that it's evaluated in the parent and the deferred-defer evaporates since it doesn't contain any new field selections.

yaacovCR commented 6 months ago

Note this may not actually be the case in the current algorithm because it may have bugs, but this is the intent.

In my spec and TS implementation, we handle this by having each DeferUsage save its parent DeferUsage if it exists, and then performing some filtering downstream.

I have the sense that your current algorithm does not correctly handle this case — but I am hoping that it does, because if it does, it manages to do so without that tracking, which I would want to emulate if possible.

graphql / graphql-spec

[RFC] Implementation of June 2023 incremental delivery format with `@defer` #1074