GraphQL 4: Autobuilding

unclecheese commented 4 years ago

In an effort to make the new schema building requirement as low impact as possible, we should explore how we could do this automatically.

Use case to think about is something as simple as adding a new block type to elemental. Not obvious that you need to run a graphql schema build!

Table stakes: Deploys will always require schema builds, and they may often be slow. We don't need to worry about autobuilding in non-dev environments.

Idea A:

Add a signature to the schema (currently all types, fields, etc have signatures for the build.. this would be an aggregate)
On graphql request:
- If dev mode:
- Load schema config
- If schema signature matches, load generated schema, and execute query.
- If schema signature does not match, do a build. Will add latency to the request.
- If not dev mode:
- Load generated schema and execute query

This would make every graphql request in dev mode slower, because bare minimum, you need to boot the schema config, which is ~500ms.

Idea B:

Create a module that provides an endpoint for checking if the schema is stale
Any components that rely on graphql in the admin could make a request to this endpoint, check staleness, and if stale, present a loading indicator while the schema builds.

The advantage here is that you can provide some level of feedback rather than just having an unusually long graphql request.

Nothing saying we can't do both, either.

chillu commented 4 years ago

I think it comes down to allowing devs to build up a mental model, effectively a decision tree - similar to the one they'll have in their head for YAML config, template caches and database schema changes.

When adding new YAML config, I need to flush
When this YAML config changes the database schema, I also need to run dev/build
When this YAML config is only changing GraphQL definitions, I don't need to flush
Idea A: When I'm on dev, things will be a bit slower but are always up to date with my code/config (with the exceptions above)
Idea B: When I'm on dev, I need to pre-generate the GraphQL schema on every change. With some additional CMS smarts, the schema is auto-building, but not in my own GraphQL-powered UIs
When I'm on prod, I need to pre-generate the GraphQL schema. A dev/build can't handle this for me (due to multi server). But it means every GraphQL request will be as fast as possible

This would make every graphql request in dev mode slower, because bare minimum, you need to boot the schema config, which is ~500ms.

So that's assuming flushless? As we roll out more UI using GraphQL, this will multiply - even with GraphQLs query nesting, I'd expect >5 queries to build the CMS UI to be common. And those 500ms delays will add up, as you work in the CMS in dev mode (e.g. to add new fields, rather than doing anything around GraphQL).

I wonder if filemtime() would be sufficient? This fstat metadata is cached within PHP. Would be quite different on SSD vs. spinning discs, but I think it's safe to assume that dev mode will pretty much always have an SSD. On production hosts, spinning discs are still more common.

Pseudocode:

md5(json_encode(
  array_map(function ($file) {
    return filemtime($file);
  },
  glob('**/_graphql/*.yml')
));

unclecheese commented 4 years ago

First thing is that I would throw out the 500ms to build the schema. In practice, it's much slower, and this is due to all the plugins. Adding sorting/filtering to everything takes time. So we're going to need to do this based on filemtime.

Yeah, I was thinking we could use symfony-finder's comparison API for this:

$files = $finder->files(['*.php','*.yml'])->in($watchedDirs)->date('> ' . $lastTimestamp);
$cache->set('lastTimestamp', time());
if ($files->hasResults() { // build }
else { return $cached; }

chillu commented 4 years ago

Could we make *.php opt-in if you are writing PHP-defined types? My understanding is that this is only necessary for custom models which should be pretty rare? PHP-based resolver logic isn't cached, so presumably doesn't need to be watched for changes. That's assuming that we'd be watching dozens of YAML files, but potentially thousands of PHP files. And that it might be the difference between dozens and hundreds of milliseconds execution. Maybe those assumptions are wrong.

unclecheese commented 4 years ago

You need to build if any DataObject has changed, basically, so by doing *.php, you're casting a wide enough net that you can't really sneak anything in past the cache, if that makes sense.

fields: '*' # needs to be reevaluated whenever a dataobject changes

That might be the only case where it matters, now that I think about it. Because it used to be that custom getters were the biggest issue. Now those need to be explicitly typed anyway.

fields:
  myCustomGetter: String # Changing the PHP code has no influence on this

unclecheese commented 4 years ago

If it really is just model-level changes to the database, we can probably do Config SHA + md5(composer.lock)... maybe???

chillu commented 4 years ago

**/_graphql/*.yml and **/*.php, but excluding vendor/ since that's captured by md5(composer.lock)? As you say, the fields: * use case means we need to take all non-vendor PHP files into account, because we can't predict which ones are DataObject subclasses, and/or would cause schema changes.

That feels realistic in terms of fstat amounts. On a moderately sized sample project here, that's less than a hundred files. I would prefer fstat over content hashing because it's likely faster. But it might also lead to more issues with badly configured host NFS mounts in virtual machines. In the worst case, we could make this configurable, it's a one-liner.

And we'd ignore the use case where you develop on forks directly in vendor/ without influencing composer.lock. The exclude paths should be configurable for the small set of users interested in auto-building for this use case.

unclecheese commented 4 years ago

I wonder if we just abstract this a way in to SchemaInvalidator and we ship a single implementation for stable, see how we get on, and maybe write a second one if there are edge cases that need a different strategy.

chillu commented 4 years ago

Once this is done, don't forget to update the docs, it notes "TODO, once we figure out where it will go" in 3_building_the_schema.md

silverstripe / silverstripe-graphql

GraphQL 4: Autobuilding #272

Idea A:

Idea B: