"Global dependencies" - Githubissues

benjie commented 9 months ago

In Grafast a step can depend on a number of previous steps; the result of these previous steps is then fed into the "execute" method.

When a Grafast plan starts, the batch size is 1; when we pass lists (or nullable boundaries, or polymorphism, or...), the batch size changes based on the number of results that now need processing.

If a step depends on a value in a higher layer (e.g. a field might depend on the list item it is coming from plus a value from context()) then these values are automatically "multiplied up" to the batch size.

It's common for steps to be dependent on "globals" for a request: variables, arguments passed to fields, context() derivatives, etc. These will always be the same value, even if the step is processing 10,000 values. So effectively if we specify first: 5 to a field, we'll get 10,000 copies of the number 5. This is super annoying for steps to deal with - either they need a backend that doesn't mind specifying the limit in a variable manner, or they need to go through all the input values and batch based on these kind of values (which normally results in a batch size of one, but a lot of effort to confirm it).

I propose that we add a new type of dependency - a global dependency. Global dependencies will only be able to be created on "bucket 0" steps (and we'll need to handle context().get(...) and similar specially to make sure they stay in bucket 0) and will mean that the values are fed through via a separate tuple to execute(); but it will mean that this first: 5 can be read as exactly that, rather than being "multiplied up".

One major drawback of this, is I was planning to allow Grafast steps to batch across multiple GraphQL requests; so where I said the batch size is 1 at the start this may not always be the case. I'll need to carefully think about the API we use for this so that we don't block future optimizations.

Ooo! If we do this, it should mean that we can remove $step.eval* from a few more places in the PostGraphile plans - anything using eval (other than skip/include) should be using global dependencies instead. With careful thought, and perhaps integration into the graphile/crystal#2013 mechanism, perhaps skip and include can also use this mechanism...

benjie commented 4 months ago

Without re-reading the above... Writing down this thought after discussion with @jemgillam before I lose track of it...

this.addDependency($foo) - as before
this.addGlobalDependency($foo) - adds a "global" dependency on foo
isGlobalStep($foo) - returns true if $foo can be a "global" dependency
$step.execute(count, values, { globals }) - addDependency() goes into values as before, addGlobalDependency goes into globals. Globals may contain extra things that you didn't request, you access by key rather than index. (Likely key: `_` + $step.id.)

In development, if you addDependency($foo) and isGlobalStep($foo) then you get a warning about the performance ramifications of doing so.

All builtin steps will have to be upgraded to use a global dependency where possible to avoid this warning.

benjie commented 4 months ago

Need to think about the interaction of global dependencies with mutations; for example if a mutation does

const $userId = context().get("claims").get("userId");

// ...

const $claims = context().get("claims");
sideEffect($claims, claims => {claims.userId = null});

Then later an access to context().get("claims").get("userId") cannot be a global since it may have changed.

Currently I'm thinking that the only globals would be input values (arguments/input objects) and the context() itself - derivatives would not be. I guess derivatives could be globals, but only prior to any side effects taking place.

benjie commented 4 months ago

Consider the query:

{
  allPeopleList {          # [People!]
    name                   # String
    petsList(first: 3) {   # [Pet!]
      name                 # String
    }
  }
}

Consider that the resolver for Person.petsList with some arbitrary SQL backend:

const plans = {
  Person: {
    petsList($person, { $first }) {
      const $pets = getRecords('pets');
      const $personId = $person.get('id');
      $pets.whereEq('owner_id', $personId);
      $pets.first($first);
      return $pets;
    }
  }
}

Here's the beginnings of our GetRecordsStep class:

class GetRecordsStep extends ExecutableStep {
  constructor(private tableName: string) {
    super();
  }
  whereEq(col: string, $step: ExecutableStep) {
    const depId = this.addDependency($step);
    this.eqConditions.push([col, depId]);
  }
  first($first: ExecutableStep) {
    this.firstDepId = this.addGlobalDependency($first);
  }

  async execute(count: number, values: GrafastValuesList, extra: ExecutionExtra): GrafastResultsList {
    const first = WHAT_GOES_HERE;

    const sql = `\
      select v._index, t.*
      from (select json_array_elements($1::json)) v
      inner join lateral (
        select t.*
        from ${this.tableName} t
        where (
          ${this.eqConditions
            .map(([col, depId]) => `t.${col} = v->>'${col}'`)
            .join(") and (")}
        )
        limit ${first}  -- 👈👈👈 Inline value of 'first' directly
      ) t on (true)
      `;
    const json = [];
    for (let i = 0; i < count; i++) {
      const obj = this.eqConditions.reduce(
        (memo, [col, depId]) => {
          memo[col] = values[depId][i];
          return memo;
        },
        { _index: i },
      );
      json.push(obj);
    }
    const dbResults = await databaseQuery(sql, [JSON.stringify(json)]);
    const results = [];
    for (let i = 0; i < count; i++) {
      results[i] = dbResults.filter((r) => r._index === i);
    }
    return results;
  }
}

Because $first is a global dependency, there will be only one value for it during execute so we can inline it directly into the SQL used for the entire batch.

But where do we read it from? I see two options:

we add it to values like regular dependencies are, but instead of it being a list of values it's just the explicit value itself
we add it to a new property inside extra

I think that option 1 is too likely to cause typing issues/etc, so I'm going to go with option 2.

benjie commented 4 months ago

Of course the isue with option 2 is that the depId 2 could be read from values or extra.globals and it would have different meanings - the step class author will need to be careful.

benjie commented 4 months ago

Okay... Scratch all that; that was a deadend of pain and suffering, primarily caused by the fact that mutations exist and they can mess with things in context which you may still want to depend on as a "global"... but we can't track that easily when we're three levels deep in lists and creating the steps in the root layer plan would be unsafe because they might happen after a mutation, and trying to hoist the dependency is way too much work, and tracking these things separately is a nightmare and....

Instead, steps themselves can either be unary (handle a single value) or batch (handle a batch of values). The top-level steps: context, rootValue, variableValues, etc are all unary. You can add dependencies as you normally would - you don't need to addGlobalDependency, but you could do addBatchDependency or addUnaryDependency if you wanted to add an assertion. Internally, we'll track unary and batch steps in the same dependencies list we always have, but when we write them to the store the unary steps won't be "multiplied up". We'll have a new execute method called something else... executeV2 or whatever... and that will be fed a tuple of your dependencies, as before, except all unary values will be replaced with null. Inside extra.unaries (which was extra.globals) will be the reverse - all unaries will have their values, but all batches will have null. We'll add a default executeV2 to ExecutableStep that'll backfill the old behavior:

executeV2(count: number, values: Array<any[] | null>, extra: ExecutionExtra) {
  const backfilledValues = values.map((v, i) => v === null ? arrayOfLength(count, extra.unaries[i]) : v);
  return this.execute(count, backfilledValues, extra);
}

this way it doesn't have to be a breaking change.

I'm not sure I like the terms "unary" and "batch" but that's a minor detail, and I do like that they both have the same number of letters.

benjie commented 4 months ago

unbatchedExecute would also continue to work exactly as it does currently. And unbatchedExecute would be preferred (required?) for all unary steps.

benjie commented 4 months ago

A step is unary if:

all of its dependencies are unary
it implements unbatchedExecute

If a step is used as a unary dependency of another step then it must not become non-unary (e.g. any additional dependencies added must be unary)

graphile / crystal-pre-merge

"Global dependencies" #505