Add GraphQL versioning to `graph-node`

graphprotocol / graph-node

Graph Node indexes data from blockchains such as Ethereum and serves it over GraphQL

https://thegraph.com

Apache License 2.0

2.9k stars 969 forks source link

Add GraphQL versioning to `graph-node` #3024

Closed azf20 closed 2 years ago

azf20 commented 2 years ago

Currently there is no versioning of Graph Node's graphQL API for subgraphs - there is only a latest version. This introduces some challenges with query determinism, as the same query could result in different responses, depending on the Graph Node version an indexer is running. Some examples:

Schema introspection would be different if (for example) new filter options are added
Queries for new filters would result in errors for older versions, and successful queries for new versions
If functionality is deprecated, older versions would respond, while newer versions would result in errors

Potential implementation:

Introducing versioning on the graphQL endpoint (graphqlVersion) would allow clients to specify the version for their query, which could then make the above queries deterministic.
Any changes that impact the behaviour of graphQL queries (features, bugfixes) would incur a version bump
This should use semver
Graph Node will support a range of graphqlVersions. In some cases, though, extensive backwards compatibility will not be feasible. This will be more fully articulated in a versioning policy.
Graph Node would need to expose the list of available versions
In the absence of a specified version, the latest graphqlVersion will be used
In development mode, graph-node will support ranged versions (e.g. ^1.0.0)

Example URL:

https://gateway.thegraph.com/api/[api-key]/subgraphs/id/[identifier]?api-version=[semver]

Questions:

When do we want to introduce version 0.1.0, and this functionality?
How would we handle deprecating older graphqlVersions?

Forum thread

That3Percent commented 2 years ago

When do we want to introduce version 0.0.1, and this functionality?

We can start by adding the url parsing code in the first PR and passing version information through. For now, the only thing graph-node needs to do is to return an unattestable error if the version is unrecognized. (With the only recognized version being "0"). The URL parsing code should pass "0" down as the default. This is necessary to remain backward compatible with the contracts, which use "0" in the version for the EIP-712 domain struct.

Once the means are in place, we can go to "0.1.0" on the first additive change we want to merge. It's as-needed.

How would we handle deprecating older graphqlVersions?

We may not need to unless the complexity for handling versioning information is getting out of hand (this is unlikely in the short term, and even then there are mitigations like importing multiple versions of a library and cleaning up old code). If you ever do need to remove support for old versions, the correct behavior would be to return an unattestable error response (same as unsupported/unrecognized/invalid versions from the future, as noted above)

We will need a similar PRs in the Gateway and Fisherman @Theodus. We can talk about that separately. We also need to support in the contracts to pass version information to the contracts on a dispute @abarmat.

lutter commented 2 years ago

I like where this is going.

For introspection queries, I wonder if we should just declare any query that involves introspection as not attestable. I think the possible variations with even minor additions to the API will be unmanageable.

Just to make sure we are all in agreement, with a semver MAJOR.MINOR.PATCH we'd expect the following for 'normal' queries when querying two graph-node instances with different GraphQL versions:

if the versions only differ in PATCH, the responses will be exactly the same (which begs the question why we'd even bother with PATCH at that level)
if the versions have the same MAJOR, but different MINOR, responses will be exactly the same as long as a query only uses features that are available in both MINOR versions. If a query leads to a successful response, that query will result in exactly that response for all later MINOR versions
if the versions differ in MAJOR, all bets are off and the two graph-node's are incomparable in that regard

For the network (and to keep graph-node from having to carry lots of different GraphQL impls) we'd need an official deprecation policy where we could disallow an older MAJOR pretty quickly after the new one is available; though that might be hard since it will require that all clients/dApps update to the new MAJOR

I also think it might be better to make the version part of the URL path rather than passing it as a query variable; for the graph-node implementation it doesn't matter too much though.

That3Percent commented 2 years ago

For introspection queries, I wonder if we should just declare any query that involves introspection as not attestable. I think the possible variations with even minor additions to the API will be unmanageable.

As long as the response is correct for that API version (features are added and removed from the schema depending on whether they are supported by that version), which is desirable for obvious reasons, then there is no issue with attesting to queries with introspection. The semver is fed into the EIP-712 domain hash struct, removing any potential for conflicts. Schemas was a motivating use-case for this change. See the forum for more details.

if the versions only differ in PATCH, the responses will be exactly the same

Ideally yes, but if there is a bugfix then maybe not. I know that the arbitration charter gives some flexibility around bugfixes, but that's a hassle to avoid. We could probably have a long conversation here since there are merits to either. Having a spot at least would give us flexibility to use it if we need to, and we don't need to resolve this now without a motivating case. One example I could imagine using patch for that came up recently would be when we moved from ordering fields in the response alphabetically to the order they were specified in the query to become GraphQL spec compliant. That seems more like a patch than a new feature, yet breaks attestations.

If a query leads to a successful response, that query will result in exactly that response for all later MINOR versions

Yes, except with the caveat about bugfixes from above.

official deprecation policy

I don't think we need that necessarily. An Indexer can always refuse to serve a query for any reason, including that the version is not supported by their software. (Ideally the Indexer signals upfront that they would reject the query, like how we do with indexing status and cost models). There need not be a reason to prevent Indexers from running legacy versions, or from maintaining a branch with more complexity to support old versions, or to even prevent them from not supporting old features immediately before the "official" policy. Consumers (eg: an Edge & Node Gateway) can route queries to the appropriate Indexers for the version they need.

In any case, I think the market will demand supporting old versions for backward compatibility in clients even though that's a pain to maintain for us core devs. So, we should prepare for the worst and mitigate the complexity by (for example) importing old builds of crates related to query and routing the request to an appropriate library version in graph-node. In that way code can still be deleted/cleaned in the latest branch, while supporting previous versions.

azf20 commented 2 years ago

Some open questions:

Do we need to make minor and patch versions user-facing? It would be preferable to support minor and patch versions seamlessly for end-users (i.e. with the client only specifying the major version), but any API change breaks query determinism (for affected queries), which makes the job of the fisherman more complicated (for example). Perhaps the Arbitration Charter allows for documented minor and patch changes within a major version? This would be preferable from a user perspective.

How would this be implemented in Graph Node? How will the code be maintained within Graph Node? (assuming that we do intend to maintain backwards compatibility) How will the client specify the version? (see this post for some food for thought)

That3Percent commented 2 years ago

Do we need to make minor and patch versions user-facing?

Yes, in some sense. One important property is that you can never trust the Indexer to supply the version. But, there are options to lessen the burden on the user.

For example, if the user trusts a Gateway, the user could supply "1.1" and the Gateway could replace that with any version that is compatible using the latest version for the indexer. Eg: it could supply one Indexer with "1.2.1" or another with "1.1.0". The Gateway is already playing a similar role by finding trusted block hashes, it instead would find "trusted versions".

You can do similar with libraries if we maintain a query library by having different semver in the library know the major version it requires, and new versions of the library always silently bumps the query version for the user to the latest known good versions.

That3Percent commented 2 years ago

How would this be implemented in Graph Node?

I leave this to the graph-node team, but I do have a recommendation based on experience writing versioned code.

Split the responsibility for parsing the version and determining it's feature set from the responsibility of acting on selected features.

For example, don't do this: if semver >= 1.1 { supportX() }. Instead do this: if supportedFeatures.includesX() { supportX() }.

This reduces the likelihood of some bugs. For example, if featureX is deprecated in 2.0, you would need to find all the places for featureX and write if semver >= 1.1 & semver < 2. But, since the version checking code isn't searchable it's easy to miss a spot.

If feature detection is done when parsing the version you can have one easy spot to blacklist versions, describe ranges for feature inclusion, and so on.

azf20 commented 2 years ago

But, there are options to lessen the burden on the user.

Those sound like good optimisations 👌

Split the responsibility for parsing the version and determining its feature set from the responsibility of acting on selected features.

This also sounds like a good pattern, interested in @dotansimha's general view on approaches here, and @lutter's specific thoughts on how this would apply to Graph Node.

parsing the version and determining its feature set

The logic here feels like it could also be applied to the version selection problem (on the Gateway, or in Graph Node for local or hosted deployments)

kamilkisiela commented 2 years ago

About semver. I think we should follow MAJOR.MINOR.PATCH even if we won't use PATCH at all, it can be always 0. Maybe at some point PATCH will save us :)

I would split the problem into two pieces, from GraphQL point of view.

changes in GraphQL Schema
changes in execution or validation

Any change in the schema means the introspection query result will change as well. Basically every time we change something, we will have to do at least MINOR++. We could create tests to ensure no leaks in schema between versions.

Changes in GraphQL Schema are easy to detect (GraphQL Inspector). The problem is with the execution/validation part, this would require PR reviewers to detect if the version should be bumped.

I'm not sure if we should exclude the introspection query result from query determinism logic. A change in the introspection query result means the schema is different that it was before. Which basically means that the query could be executed on X without any issues but on Y it would fail. The problem remains.

About deprecation, we could use/implement @deprecated(reason: String) as something to put on fields and objects. GraphiQL picks it up and shows the deprecation node. This way it could be communicated to the end user.

About feature / version flags. In short term it should work but we will hit a wall at some point. They are always tricky and we would need a proper set of integration tests.

azf20 commented 2 years ago

Considerations:

Will the default (no apiVersion specified) be the "genesis" schema?
Do we want to use version numbering, or the Github-specified feature-flagged approach (link). Feature flags could then be rolled into the code, and there could be groupings of feature flags (for larger / bundled features). Capability-driven.
Handling leakage between versions
What testing will we need in place (E2E testing) - testing across versions or feature flags

Outside graph-node: different endpoints on the gateway, with different "generations" of features. Features in graph-client. Surfacing the features used on the indexer API.

Can we track graphQL usage, on graph-node? Previously been implemented.

That3Percent commented 2 years ago

Will the default (no apiVersion specified) be the "genesis" schema?

Yes, we have to do this to remain backward compatible with existing clients.

Do we want to use version numbering, or the Github-specified feature-flagged approach

Version numbering. Feature flags are tedious for bugfixes, we already effectively have feature flags in the query itself, version numbers are shorter & have less developer cognitive load, and feature flags create more unique protocol versions (combinatorics is bad for cross-checking and sanity).

Handling leakage between versions

What does this mean?

dotansimha commented 2 years ago

Version numbering. Feature flags are tedious for bugfixes, we already effectively have feature flags in the query itself, version numbers are shorter & have less developer cognitive load, and feature flags create more unique protocol versions (combinatorics is bad for cross-checking and sanity).

I think bug fixes (in the actual code) are communicated as graph-node version, no? Since they affect the implementation and not the GraphQL schema.

I think we are not going to introduce a lot of core GraphQL schema changes (metadata/global types)? this is the reason why we were thinking about feature flags as a way to "activate" specific GraphQL features (for example: specifying relay-pagination will add those specific types to the schema, and will generate the type Query in a different way).

Handling leakage between versions

graph-node will need to know how to serve different GraphQL schemas from a single running service. A user that will ask for v1 or v2 will get different GraphQL schema, while the code behind the scenes might be the same, since this kind of versioning only affect how we'll build the GraphQL Schema.

Given API versioning (or, feature flags, doesn't matter for this example), we need to make sure that v1.0.0 will give you genesis schema only, while v1.1.0 will return the genesis schema AND "new feature". This needs to be translated into some specific Rust function calls (to build the schema with different types/approaches), and the goal of preventing leakage is to make sure we are producing the correct schema per each version, without "leaking"/affecting features in older versions, while introducing changes/new features to new versions (Since v1 and v1.1.0 are going to be served from the same graph-node)

That3Percent commented 2 years ago

We had a call about this and came to consensus around versioning vs feature flags.

There are some advantages to using feature flags in the implementation of graph-node (insofar as it reduces complexity). But, we should not be exposing feature flags to the user via the query API or any protocol API. Instead, the user-facing APIs will all use version numbers.

That3Percent commented 2 years ago

GIP detailing graph-node requirements, and the global strategy for deprecating support for query versions without breaking clients: https://forum.thegraph.com/t/gip-0024-query-versioning/3145