[Warden] Types referenced from invisible fields still show up.

eapache commented 6 years ago

@rmosolgo @xuorig

Given a simple schema like this:

type A {
  b: B
}

enum B {
  VALUE
}

If the type A is hidden then we would expect B to be hidden as well, but this is not the case. This is because when checking whether B is referenced, we only look to the immediate visibility setting of the field b rather than doing the full check as to whether b itself is implicitly hidden (which it is, because it's on A which is hidden).

Instead of checking visible? in https://github.com/rmosolgo/graphql-ruby/blob/eff0a6c952734225bc899ec712d20c5a4e1b9be9/lib/graphql/schema/warden.rb#L152-L155 I believe we can just check visible_field? instead which does the right thing?

Caveats to that suggestion:

I understand references_to can return arguments as well, so we'll need something like if field; visible_field?(x); elseif argument; visible_field?(x.field)
This potentially results in an infinite loop if a type references itself? I guess we'll need to keep a set of all of the "actively visited" types so we can avoid that.

rmosolgo commented 6 years ago

Yeah, I agree it should work the way you describe! Thanks for those details.

eapache commented 6 years ago

I took a crack at fixing this and I think I missed something key in my initial analysis: visible_field?(b) doesn't check the visibility of A, it checks the visibility of B. In fact, I don't see a way to get from the GraphQL::Field for b to the type A in the first place; is this even possible?

eapache commented 6 years ago

GraphQL::Schema::Field has an owner which would work, is updating the warden to use the class-based classes a possibility?

eapache commented 6 years ago

This problem effectively devolves into https://en.wikipedia.org/wiki/Reachability from the root for a cyclic digraph. Eagerly walking from the root would work but would require preprocessing the entire schema on every new warden (aka every request). Even lazy path-climbing back to the root (assuming we can get from a field to its owner) isn't going to be very efficient depending on the schema.

I appreciate the current warden implementation as in fact a very good sort of "best-effort" to do as much reference-checking as possible without actually doing any recursion, however I'm starting to wonder if it's more effort than it's worth. Now that I've grokked the problem I can point out a few other potential surprises here that aren't fixable without a full reachability analysis.

I'll give it some more thought to see if there's a clever (efficient) algorithm for reachability I can construct given some of the specific constraints of the GraphQL schema graph, but if not I wonder if we shouldn't be going the other way: simplify the warden to do only a minimal, well-defined set of reference checks and document it explicitly as the schema author's problem to make sure their filter is self-consistent.

rmosolgo commented 6 years ago

It sounds like you've had an ... interesting ... afternoon 😆

has an owner

I think it would be ok to use that _when it's available, something like:

(owner_type = field.metadata[:type_class].type.unwrap) && visible?(owner_type.graphql_definition))

(with a lot of hand-waving to maintain compatibility).

Alternatively, I certainly couldn't say no to a rewrite of some kind! I wonder if it would be possible to do a shiny new take on schema filtering with class-based schemas, while keeping the old one for compatibility for a little while. If you want to open a dedicated issue or an exploratory PR, I'm definitely game.

eapache commented 6 years ago

Another schema that my brain cooked up to complicate matters:

type Query {
  a: A
  P: P
}

# First chain

type P { x: Q }
type Q { x: R }
type R { x: String }

# Second chain

type A { x: B }
type B { x: C }
type C { x: D }
type D { x: P }

If I filter out type R then the entire schema should end up empty, but that’s incredibly difficult to compute without multiple passes.

eapache commented 6 years ago

TLDR: I do not believe this problem is practically solvable. I recommend simplifying the warden to avoid surprises, at the cost of requiring schemas to be more thorough in what they filter.

edit: maybe it is after all, see https://github.com/rmosolgo/graphql-ruby/issues/1333#issuecomment-397817135

Recommendation

The goals for the warden system are (in no particular order):

efficient to execute
reasonable to implement
easy to understand behaviour
schema authors can write simple and easy filter functions

The current warden falls down on the third point since it tries to solve some of the complexity of this problem (see Appendix 1) but can't solve all of it, leading to many surprising edge cases.

Given this, I believe:

Types (objects, input objects, interfaces, unions, enums, scalars) should have their visibility determined only by the filter function and not by any consideration of reachability or validity. This is much easier to reason about than the existing implementation, should be less surprising when properly documented, and hopefully does not overly burden schema authors.
Fields, arguments, and enum values should continue to be implicitly hidden if they belong to a hidden type. This behaviour is "free" since these can't exist on their own anyway.
Fields and arguments should continue to respect the visibility of their return and input types. This check is efficient to implement, not terribly hard to reason about, and allows filter functions to restrict their logic to operate solely on types while still doing "the right thing" with fields and arguments.

If this is unacceptable for whatever reason, the most likely candidate for a "full" solution is "Precomputing distance to root" in Appendix 2.

Appendix I: Problem Definition

Summary

General notes:

We can ignore the set of orphan types, it's trivial.
We can ignore wrapping types (lists and non-nulls) they have no impact. Treat all fields and arguments as returning or accepting the unwrapped type.

To correctly determine if an element is visible, we need to check three things:

Whether the element itself passes the filter function.
Whether the element can be reached from the schema root via visible elements.
Whether the element is semantically valid when the visibility of its children is considered (e.g. whether an object type has at least one visible field).

Graph(QL) Theory

Given a schema, we can transform it into a directed graph G = (V,E).

Let V (the vertices of G) be all elements of the schema: the set of all types, enum values, fields, and arguments.

Let E (the edges of G) be determined as follows:

all objects, input objects, and interfaces have an edge to each of their fields
all fields have an edge to each of their arguments
all fields have an edge to their return type
all arguments have an edge to their input type
all unions have an edge to each type in the union
all enums have an edge to each of their values

Let r be the vertex in G representing the QueryRoot (or MutationRoot or SubscriptionRoot or...).

Let T \subset V be the set of all "terminal" vertices in the graph: the set of all scalar types and enum values.

Let f(x) be the filter function which accepts a vertex and returns true or false depending on the context of the request.

Let G' be the subgraph of G constructed by removing all vertices that are arguments (yes this is weird, its use will become apparent).

Let F(G) be the subgraph of G constructed by removing all invisible vertices, accounting for all visibility constraints (filter function, reachability, and validity).

Formalization

Given the above definitions, we can formalize our problem as: given a vertex x \in G, determine if x \in F(G).

In practice this means we need to determine that all three of the following hold:

f(x) must be true (the element passes the filter function)
there is a path from r to x in F(G) (the element is reachable from the root, considering only the visible subschema)
there must exist some terminal element t \in T such that there is a path from x to t in F(G') (this non-obviously performs all other validations e.g. that objects have at least one field)

Slightly Plainer English

For a given schema element we have to check the filter function (which is trivial and I will ignore from here on) as well as solve two symmetric reachability problems: one "backwards" (to the root) and one "forwards" (to a terminal scalar or enum value).

The backwards one is hopefully obvious; an element not reachable from the root should not show up in the schema.

The forwards one requires a bit more explanation. We've constructed the graph such that non-terminal nodes must have an outgoing edge in order to be valid:

objects, input objects, and interfaces must have at least one field
fields and arguments must have a type to return/accept
unions must have at least one concrete type
enums must have at least one value

The exception to this is fields and arguments: a field may have no arguments, and an outgoing edge to an argument is not sufficient for a field to be valid. This is why we define and operate on G' for the third requirement of our formal definition.

Appendix 2: Explored Solutions

Unfortunately for us, graphs derived from GraphQL schemas in this way are not guaranteed to have any of the following properties: planar, acyclic, biconnected, simple, symmetric. This prevents us from using a couple of existing published algorithms:

Thorup's Algorithm (requires the graph to be planar)
Halftermeyer's Algorithm (requires the graph to be 3-connected)

I also considered some ad-hoc schemes based on the principle that while F(G) cannot be precomputed for a given request, some knowledge can be precomputed based on G or G':

Precomputing possible paths

If we precomputed and stored the set of paths from r to x in G we could do various nice things with that data to determine if any paths were still present in F(G). Unfortunately that set of paths can take O(2^n) space for some graphs, e.g. ones looking like:

  A   D   G   J  
 / \ / \ / \ / \ 
R   C   F   I   L
 \ / \ / \ / \ /
  B   E   H   K

Precomputing distance to root

If we performed a breadth-first traversal of G, then stored the minimum distance from r for each vertex, we could efficiently find the shortest path back from x to r by using a min-queue. However, this approach would be slow for the common schema structure where a large chunk of interconnected schema is flagged as invisible by a single "bottleneck" field or type, as it would try all possible paths through the interconnected portion even though they all lead to the bottleneck.

That said, with appropriate memoization I do believe this approach is the one most likely to be practical, and it has the nice property of inverting very easily to solve the symmetric "forwards" reachability problem.

For posterity, a slightly more formal explanation.

Initialization: Perform breadth-first traversal of G starting at r. For each newly visited vertex, store the depth of the traversal at that point (the minimum distance from r to the vertex).
Calculation: For a vertex x, put all the predecessors of x into a min-priority-queue by their precomputed distance to r. Pop, check, and recurse, stopping when you pop r or run out of valid vertices. Memoize the result for intermediate vertices.

Appendix 3: References

https://en.wikipedia.org/wiki/Reachability https://en.wikipedia.org/wiki/St-connectivity https://tel.archives-ouvertes.fr/tel-01110316/document

xuorig commented 6 years ago

Amazing analyzis @eapache 👌👌👌

Given some recent experiences with the warden, I agree with the solution you propose here.

Less magic in warden, and nore explicit checks in schema masks sounds like the way forward to me too.

eapache commented 6 years ago

TLDR: I completely missed the fact that except for introspection queries, we don't have to operate vertex-at-a-time, we can operate on the entire query at once treating it as a subgraph of G. This opens up a much more efficient (though still complex) option for the common case.

I've been unfortunately unable to stop myself from thinking about this, and I've come to the following (additional) corollary. Given a well-formed graphql query Q = (V', E') \subgraph G that does not use any introspection vertices, then (f(q) \forall q \in V') \implies (Q \subgraph F(G)). That is, given a well-formed query (with no introspection) whose nodes all individually pass the filter function, then we don't have to worry about either reachability constraint because they are implied by the query being well-formed. (This also means Q-without-arguments is a subgraph of F(G') but I'm running out of notations to formalize with.)

Since fully-visible well-formed queries with no introspection (excluding __typename which doesn't matter for this corollary) form the vast majority of common cases, and since we have to validate the query anyway, that means we can blindly check the filter function for all queried vertices and if they all pass then we're done. This actually ends up being more efficient than my proposal for the common case since fields and arguments no longer have to double-check the type they expose, it's implicit in the graph. However, it does exclude two cases:

queries with introspection would have to fall back to the slower precomputed-distance algorithm
queries where one queried vertex fails its filter check would have to fall back as well in order to determine the correct error message (note that all such queries would be invalid in some way though)

The above isn't really a formal proof, but I believe the result is an algorithm that is sufficiently efficient under real production workloads, falling back to the slower (but still not terrible) approach only for introspection queries and queries which are already invalid.

The only question is if it's sufficiently simple to bother implementing. My original recommendation of a simplified warden may still be the better choice.

jturkel commented 4 years ago

@rmosolgo / @eapache - I took a crack at removing types from introspection results that become unreachable due to visibility checks in #2596 which seems to solve the core of this issue. Any feedback would be appreciated!

rmosolgo commented 4 years ago

Fixed by #2596

eapache commented 4 years ago

Joel's PR addressed my original example case, which turned out to be half of the total problem (see https://github.com/rmosolgo/graphql-ruby/issues/1333#issuecomment-397740253 for the other half). I'm happy to open a new condensed ticket for the remaining issue, or we should just reopen this one.

rmosolgo commented 4 years ago

a new condensed ticket

Yes, could you please? 😅

rmosolgo / graphql-ruby