Closed gmcgibbon closed 2 weeks ago
Hey, thanks for the detailed write-up. I'm definitely open to exploring options for loading the schema as-needed.
Currently, any GraphQL query loads the entire schema, because, as you noticed, query(...)
and mutation(...)
traverse their given types and add everything to the schema's registry of types. (subscription(...)
does the same thing.) Those methods also resolve any types given as Procs -- the Procs change the order that types are loaded in (Proc-based types are loaded last), but they're all still loaded during the mutation(...)
or query(...)
call.
I think making a GraphQL::Schema
load types as-needed would be a good bit of work, something like:
query(...)
/mutation(...)
/subscription(...)
to store their inputs but not traverse them immediately Schema.add_type_and_traverse
) to only load code as-needed (or not at all? and instead add codepaths which populate the schema's type registry as they run...) GraphQL::Schema.types
, which returns the entire type registry; instead make sure they only ever ask for the minimum required types.If we did all that, we'd probably want to keep the old code too, so that in Production, the whole schema could be loaded upfront -- before the first request comes in, to reduce latency, and before any forking, to improve memory sharing in copy-on-write situations.
So, it's all possible -- but I think it will be a bit more than accepting procs for resolver:
😅
Thanks for mapping out the process @rmosolgo. I'll try to roughly implement this and let you know if I get stuck. Once we have something working I imagine it will be easier to accept a patch.
GraphQL types have solved this problem already
Looks like I'm wrong about this. As soon as the schema is loaded everything is loaded (which also explains my particularly flamegraphs). Proc typed fields just defer loading when you load them in isolation (eg. MyGraphqlSchema => loads all proc types
but MyGraphqlObject => doesn't load proc types
.) It might be easier to work on lazily evaluating types first, since we're already halfway there. It looks like there's a ton of work behind lazy resolvers.
👋 I'm gearing up to take a try at this. You mention something that's halfway there -- have you had any luck in your approaches to this change?
I'm starting on this in #4974. So far I've done the bare minimum: delay calls to add_type_and_traverse
until the result of that traversal is actually needed. This will put off the GraphQL-related work of loading type files until the first query or so. (But Rails/Ruby will still parse and load those files, following constant references, so it's not really a big time-saver yet.)
I plan to keep working on:
Sorry I was on vacation. Before I left, I tried making schema additions lazy, but I'm not super happy with the implementation yet. I tried to make the type loading happen when we execute a query (so we load what we use in the query, essentially). It doesn't account for eager-loading yet, though.
I think we can save a bunch of time if we simply lazy-load the root types, which seems like an easier thing to implement so I'm looking into that next. In our case specifically, 99% of the time we're parsing queries and not mutations, but it takes a long time to load the root mutation when we parse a query.
Also in our case, we use procs over strings to get some semblance of type coherence in our code, so I'm not sure if strings will work. Though, this makes getting the name of the type very difficult, and AFAIK field objects take issue with not knowing the type upfront because of the extension API.
I'll take a closer look at your PR this week (I'm sure its much more coherent than my work), thanks so much for looking into this!
What do you think of accepting procs as types as a good next step?
It seems like any future in this area will need a way to load some_type.rb
without making direct reference to the Ruby constants that define SomeType
's field's types. I can see that happening with -> { OtherType }
-style procs (not beautiful, but practical), or perhaps by field :other { type(OtherType) }
, where the field do ... end
block is evaluated later instead of right away.
I think that work could be done as a precursor, what do you think? (And a lot of it was done in your lazy_types
branch already -- but we could get it into master as an incremental step in the right direction.)
My latest attempt is to build a replacement for Warden
which doesn't make as much heavy use of the top-level, schema-wide hashes of types, etc: https://github.com/rmosolgo/graphql-ruby/pull/4998
I think this approach will let me build a new implementation of type lookup while maintaining parity (or conscious difference) with the current implementation. Also, they can be swapped at runtime (and maybe even run at the same time, so you could get log messages when they diverge...).
It's basically working now, but for it to really be useful, I'll have to tune it to do as little load_all_types
as possible. (As well as the rest of the support for defining and handling lazily-defined types, yeah ...)
Awesome! I got lazy type evaluation to work on the schema I was working on, but it is a little hacky. I did it by deferring load hook execution:
class MySchema < GraphQL::Schema
unless Rails.configuration.eager_load
def self.root_type_for_operation(operation)
case operation
when "query"
query.run_load_hooks_once
own_types.delete(query.graphql_name) # delete and re-add type (and nested types)
add_type_and_traverse(query, root: true)
when "mutation"
mutation.run_load_hooks_once
own_types.delete(mutation.graphql_name) # delete and re-add type (and nested types)
add_type_and_traverse(mutation, root: true)
end
super
end
end
end
end
The load hooks are what makes loading my schema so slow, and were initially executed as soon as the root type is loaded. With this patch, the warden problem and the schema constant loading problem don't matter. However, this is a band-aid and I don't think it is correct long-term. I think if we unwrap proc-wrapped root types in root_type_for_operation
and fix the warden to not reference root types, I can revert my hack.
That's great, I'm glad you found something that works for now :+1:
What does .run_load_hooks_once
do? What kind of things run in those hooks?
What kind of things run in those hooks?
It calls something like this:
@called ||= begin
ActiveSupport.run_load_hooks(:query_root, self)
true
end
That way you can write lines like this in initializers:
ActiveSupport.on_load(:query_root) do
# field definitions or module includes here...
end
This basically just lets us control when callbacks are invoked to sidestep the issue. I'll be able to take a second look at the upstream implementation of this in ~2 weeks.
Thanks for sharing, very cool. No hurry on #4998, I'm going to keep chipping away at it over there.
I've got it nearly full parity with Warden, but I think I'm going to identify the parts of the new code which require loading all types, and see if it can be made to not do that -- even if it breaks parity. I think with good documentation and runtime warnings, a migration would be possible (and would probably require implementing visible?
in some cases where the warden was automatically hiding things).
Update here:
The new Schema::Subset
class isn't documented yet, but it can be enabled by self.use_schema_subset = true
in your schema. (I will change that to something ... nicer.) When it's enabled, you can:
query { Types::Query }
, and those blocks will be called when the first query comes in (or when other code calls MySchema.query
) #5055 field ... { ... }
blocks, for example, field :thing do type(Types::Thing) end
#5054 Both of those changes included Rubocop rules to enforce their use when desired.
I have a few more improvements to make to Schema::Subset
before I finally document it, but I think it's coming along well :+1:
I added docs for this in 4625e59cd. It's available in 2.3.x latest and I'll announce it on the mailing list after I release 2.4.0.
Is your feature request related to a problem? Please describe.
Right now, resolver classes are defined like this:
In Rails applications with lots of resolvers, this can trigger a lot of autoloads, and contribute to loading a lot of files we don't actually need to execute a query in development mode.
Describe the solution you'd like
GraphQL types have solved this problem already by wrapping type defs in procs like this:
I think it would be great if we could have this syntax to delay loading resolvers until actually used:
But it raises an error:
Describe alternatives you've considered
I considered opening an issue for a feature that lazily loads the root
query
ormutation
, but that seems a little more difficult from a public API perspective, and doesn't really solve the problem as granularly as I'd like. It would defer loading all mutation code until the first mutation, and all query code until the first query. Lazy resolvers should theoretically only load the code and types needed to execute a query/mutation while keeping the root types intact.I also thought about creating a schema for queries and a different one for mutations to split loading the root query and mutation. This would work, but it seems like a hack that doesn't really address the root concern in the GraphQL gem.
Additional context
I noticed this problem when profiling my application and seeing that
AppNameSchema.execute(...)
would load both query and mutation roots (and subsequent resolvers). My application's schema is really big and takes ~3 seconds to load the mutation root and ~2 seconds to load the query root, so this feature would really help the time to first query in autoloaded environments.cc @swalkinshaw