Closed benjie closed 4 months ago
Without re-reading the above... Writing down this thought after discussion with @jemgillam before I lose track of it...
this.addDependency($foo)
- as beforethis.addGlobalDependency($foo)
- adds a "global" dependency on fooisGlobalStep($foo)
- returns true if $foo can be a "global" dependency$step.execute(count, values, { globals })
- addDependency()
goes into values
as before, addGlobalDependency
goes into globals
. Globals may contain extra things that you didn't request, you access by key rather than index. (Likely key: `_` + $step.id
.)In development, if you addDependency($foo)
and isGlobalStep($foo)
then you get a warning about the performance ramifications of doing so.
All builtin steps will have to be upgraded to use a global dependency where possible to avoid this warning.
Need to think about the interaction of global dependencies with mutations; for example if a mutation does
const $userId = context().get("claims").get("userId");
// ...
const $claims = context().get("claims");
sideEffect($claims, claims => {claims.userId = null});
Then later an access to context().get("claims").get("userId")
cannot be a global since it may have changed.
Currently I'm thinking that the only globals would be input values (arguments/input objects) and the context()
itself - derivatives would not be. I guess derivatives could be globals, but only prior to any side effects taking place.
Consider the query:
{
allPeopleList { # [People!]
name # String
petsList(first: 3) { # [Pet!]
name # String
}
}
}
Consider that the resolver for Person.petsList
with some arbitrary SQL backend:
const plans = {
Person: {
petsList($person, { $first }) {
const $pets = getRecords('pets');
const $personId = $person.get('id');
$pets.whereEq('owner_id', $personId);
$pets.first($first);
return $pets;
}
}
}
Here's the beginnings of our GetRecordsStep class:
class GetRecordsStep extends ExecutableStep {
constructor(private tableName: string) {
super();
}
whereEq(col: string, $step: ExecutableStep) {
const depId = this.addDependency($step);
this.eqConditions.push([col, depId]);
}
first($first: ExecutableStep) {
this.firstDepId = this.addGlobalDependency($first);
}
async execute(count: number, values: GrafastValuesList, extra: ExecutionExtra): GrafastResultsList {
const first = WHAT_GOES_HERE;
const sql = `\
select v._index, t.*
from (select json_array_elements($1::json)) v
inner join lateral (
select t.*
from ${this.tableName} t
where (
${this.eqConditions
.map(([col, depId]) => `t.${col} = v->>'${col}'`)
.join(") and (")}
)
limit ${first} -- πππ Inline value of 'first' directly
) t on (true)
`;
const json = [];
for (let i = 0; i < count; i++) {
const obj = this.eqConditions.reduce(
(memo, [col, depId]) => {
memo[col] = values[depId][i];
return memo;
},
{ _index: i },
);
json.push(obj);
}
const dbResults = await databaseQuery(sql, [JSON.stringify(json)]);
const results = [];
for (let i = 0; i < count; i++) {
results[i] = dbResults.filter((r) => r._index === i);
}
return results;
}
}
Because $first
is a global dependency, there will be only one value for it during execute
so we can inline it directly into the SQL used for the entire batch.
But where do we read it from? I see two options:
values
like regular dependencies are, but instead of it being a list of values it's just the explicit value itselfextra
I think that option 1 is too likely to cause typing issues/etc, so I'm going to go with option 2.
Of course the isue with option 2 is that the depId 2
could be read from values
or extra.globals
and it would have different meanings - the step class author will need to be careful.
Okay... Scratch all that; that was a deadend of pain and suffering, primarily caused by the fact that mutations exist and they can mess with things in context
which you may still want to depend on as a "global"... but we can't track that easily when we're three levels deep in lists and creating the steps in the root layer plan would be unsafe because they might happen after a mutation, and trying to hoist the dependency is way too much work, and tracking these things separately is a nightmare and....
Instead, steps themselves can either be unary
(handle a single value) or batch
(handle a batch of values). The top-level steps: context, rootValue, variableValues, etc are all unary
. You can add dependencies as you normally would - you don't need to addGlobalDependency
, but you could do addBatchDependency
or addUnaryDependency
if you wanted to add an assertion. Internally, we'll track unary and batch steps in the same dependencies
list we always have, but when we write them to the store
the unary steps won't be "multiplied up". We'll have a new execute
method called something else... executeV2
or whatever... and that will be fed a tuple of your dependencies, as before, except all unary values will be replaced with null
. Inside extra.unaries
(which was extra.globals
) will be the reverse - all unaries will have their values, but all batches will have null
. We'll add a default executeV2
to ExecutableStep
that'll backfill the old behavior:
executeV2(count: number, values: Array<any[] | null>, extra: ExecutionExtra) {
const backfilledValues = values.map((v, i) => v === null ? arrayOfLength(count, extra.unaries[i]) : v);
return this.execute(count, backfilledValues, extra);
}
this way it doesn't have to be a breaking change.
I'm not sure I like the terms "unary" and "batch" but that's a minor detail, and I do like that they both have the same number of letters.
unbatchedExecute
would also continue to work exactly as it does currently. And unbatchedExecute
would be preferred (required?) for all unary steps.
A step is unary if:
unbatchedExecute
If a step is used as a unary dependency of another step then it must not become non-unary (e.g. any additional dependencies added must be unary)
In Grafast a step can depend on a number of previous steps; the result of these previous steps is then fed into the "execute" method.
When a Grafast plan starts, the batch size is 1; when we pass lists (or nullable boundaries, or polymorphism, or...), the batch size changes based on the number of results that now need processing.
If a step depends on a value in a higher layer (e.g. a field might depend on the list item it is coming from plus a value from
context()
) then these values are automatically "multiplied up" to the batch size.It's common for steps to be dependent on "globals" for a request: variables, arguments passed to fields,
context()
derivatives, etc. These will always be the same value, even if the step is processing 10,000 values. So effectively if we specifyfirst: 5
to a field, we'll get 10,000 copies of the number 5. This is super annoying for steps to deal with - either they need a backend that doesn't mind specifying the limit in a variable manner, or they need to go through all the input values and batch based on these kind of values (which normally results in a batch size of one, but a lot of effort to confirm it).I propose that we add a new type of dependency - a global dependency. Global dependencies will only be able to be created on "bucket 0" steps (and we'll need to handle
context().get(...)
and similar specially to make sure they stay in bucket 0) and will mean that the values are fed through via a separate tuple toexecute()
; but it will mean that thisfirst: 5
can be read as exactly that, rather than being "multiplied up".One major drawback of this, is I was planning to allow Grafast steps to batch across multiple GraphQL requests; so where I said the batch size is 1 at the start this may not always be the case. I'll need to carefully think about the API we use for this so that we don't block future optimizations.
Ooo! If we do this, it should mean that we can remove
$step.eval*
from a few more places in the PostGraphile plans - anything using eval (other than skip/include) should be using global dependencies instead. With careful thought, and perhaps integration into the graphile/crystal#2013 mechanism, perhaps skip and include can also use this mechanism...