Type information lost during parsing.

jimsynz commented 4 years ago

A coworker and I have been trying to understand the way that Juniper parses and validates inbound queries. It seems that the document parser has a reference to the schema and knows the types of all fields at parse time (which makes sense, otherwise they cannot be correctly parsed and validated) however that type information is not stored and has to be regenerated in the executor as it resolves the fields.

We've been investigating porting our pine.js query compiler to rust and replacing the OData interface with GraphQL. Ideally we'd like to access the executor's lookahead except have it contain the downcasted concrete types of our schema rather than a simple json-ish representation. Our investigations have led us to believe that we may be able to reconstruct information in our root resolvers by combining lookahead with the executor's current_type and recursively generating sub-executors -- but this seems far from ideal.

My questions are as follows:

Is there a reason that type information is not carried through from parsing into the executor?
Providing the answer to 1 is "no"; would you accept a PR from us to rectify this?

We are planning on investing a non-trivial amount of engineering time into this project and if we can help make Juniper better as a side effect then all the better.

theduke commented 4 years ago

After a quick parsing of the README, it seems like pine would take in some static description of both the the database schema and the full API schema, and generate everything accordingly - without the user writing any resolver code manually.

Correct? This would mean you implement either a proc macro or a build.rs script to generate all the code.

This is partially similar to what juniper-from-schema does, except you also know how to generate all resolver code because you know the database schema. wundergraph is actually doing something very similar: providing a way to map a DB schema to a GraphQL API.

You could look at those two projects for inspiration on how to implement things.

In that scenario, especially if using a single description of the whole schema like juniper-from-schema, you wouldn't need juniper to provide any additional type information, because you have a full graph of the schema at compilation time.

The only limitation is that you would need to put your custom validation code into the resolvers, because there is no hook for validation at the moment. This could be changed. See the end of the post.

For reference, I'll provide a bit of a explanation on how juniper works.

Background

In a certain sense, the execution phase is fully typed. There is no type erasure or use of trait objects and all execution is statically dispatched via the GraphQLType trait. This begins with the RootNode type which contains the query + mutation type, and execution flows down from there.

There is not, however, a full generic tree of types that would statically describe the entire schema. As in, when the resolving methods on the GraphQLType trait are called, the corresponding type does not have any static information about the children in the tree.

This is mostly handled via compile time code generation instead. Simply stated, to resolve a object, there is a generated match statement that matches on the field name, and then calls a method on the corresponding nested type for that field name to resolve the output for that field.

This theoretically could have been designed differently, in a way similar to how diesel (a ORM) works, with heavy use of generics to fully represent the type system statically, which would allow a way to declare the schema without using so many proc macros. But this would have been pretty awkward to implement without hlists, as the diesel code base shows. It also provides not much of a benefit in the context of a server, where the structure is fixed.

More importantly, it's (almost) impossible to have a statically typed representation of the query, because those are dynamic and can have a different tree structure every time.

Validation

Validation right now works differently. It does not use the same system of bubbling down via the GraphQLType, and just uses a generic schema description instead. It was done this way because without custom validation hooks, there wasn't really a need to generate a whole lot of code just for validation if a generic schema provides all information.

Note you can still do custom validation in your resolver code, and return an error on failure. So you could implement pine regardless. The only downside to this is that you can't abort early in the validation phase and prevent redundant queries.

This is something that could be remedied eventually though. In theory the validation could work exactly like the resolver code, with a GraphQLType::validate() method that bubbles down the validation.

I'd also actually like to implement statically typed lookaheads , by generating a lookahead type for each object type and filling it from the query input. Note that this would only be marginally safer though, because it would have to use Box<dyn std::any::Any> due to the dynamic nature of queries.

If you know the full schema, you can generate equivalent code already.

jimsynz commented 4 years ago

Thanks for the reply @theduke, I'll talk it over with the other folks at work and may have follow up questions.

theduke commented 4 years ago

Closing this since it's not actionable, but feel free to ask here.

graphql-rust / juniper

Type information lost during parsing. #446

Background

Validation