mercurius-js / mercurius

Implement GraphQL servers and gateways with Fastify
https://mercurius.dev/
MIT License
2.32k stars 234 forks source link

Proposal: Implement cache-control & last-modified headers/directives #225

Open asciidisco opened 3 years ago

asciidisco commented 3 years ago

This issue is more about picking your folks brains than "just" requesting a feature. I do have a prototypical version of this running, but would like to get your input on the

before I'm going to issue a PR. Generally speaking, it adds features to fastify-gql that enable plain old HTTP caching methods (namely Cache-Control Headers & Last-Modified Headers) to be generated, as well as adhering to the extensions & directive format created & implemented by Apollo.

Cache-Control

Apollo implements an interface (steered by directives or a programmatic api) to describe the cache expiration (max-age) and scope as well as to distill fine grained data via the extensions response property

The response map may also contain an entry with key extensions. This entry, if set, must have a map as its value. This entry is reserved for implementors to extend the protocol however they see fit, and hence there are no additional restrictions on its contents.

out of it & to generate Cache-Control Headers.

Response Format

Slightly modified, originally taken from the Apollo docs:

Apollo Cache Control exposes cache control hints for an individual request under a cacheControl key in extensions:

Note: Compatibility of graphql-jit with the extensions property is yet to be verified.

{
  "data": {},
  "errors": null,
  "extensions": {
    "cacheControl": {
      "version": 1,
      "hints: [
        {
          "path": ["a", "b"],
          "maxAge": "<seconds>",
          "scope": "<PUBLIC or PRIVATE>"
        },
      ]
    }
  }
}

This field should be a list of path segments starting at the root of the response and ending with the field associated with the error. Path segments that represent fields should be strings, and path segments that represent list indices should be 0‐indexed integers. If the error happens in an aliased field, the path to the error should use the aliased name, since it represents a path in the response, not in the query.

type Post @cacheControl(maxAge: 30) {
  id: Int!
  votes: Int @cacheControl(maxAge: 240)
  readByCurrentUser: Boolean! @cacheControl(scope: PRIVATE)
}
"cacheControl": {
  "version": 1,
  "hints": [
    {
      "path": [
        "post"
      ],
      "maxAge": 30
    },
    {
      "path": [
        "post",
        "votes"
      ],
      "maxAge": 240
    },
    {
      "path": [
        "post",
        "readByCurrentUser"
      ],
      "scope": "PRIVATE"
    }
  ]
}

This would also attach a Cache-Control: max-age=30 header, indicating that the whole response could be cached for 30 seconds. The extension data could be leveraged by a client, knowing that, if this request would be issued again within the 240 seconds time window, to leave the votes field out of the subsequent request.

The @cacheControl directive can be added to an individual field or to a type.

Hints on a field describe the cache policy for that field itself. Given the above example, Post.votes can be cached for 30 seconds.

Hints on a type apply to all fields that return objects of that type (possibly wrapped in lists and non-null specifiers). For example, the hint @cacheControl(maxAge: 30) on Post applies to the field Comment.post, and the hint @cacheControl(maxAge:1000) on Comment applies to the field Post.comments in the example below:

type Post @cacheControl(maxAge: 30) {
  id: Int!
  title: String
  author: Author
  votes: Int @cacheControl(maxAge: 240)
  comments: [Comment]
  readByCurrentUser: Boolean! @cacheControl(scope: PRIVATE)
}

type Comment @cacheControl(maxAge: 1000) {
  post: Post!
}

type Query {
  latestPost: Post @cacheControl(maxAge: 10)
}

Hints on fields override hints specified on the target type. For example, the hint @cacheControl(maxAge: 10) on Query.latestPost takes precedence over the hint @cacheControl(maxAge: 30) on Post.

Request Format

For Apollo compatibility reasons, I'd also implement the following strategy, for clients:

Clients can include cache control instructions in a request. The only specified field is noCache, which forces the proxy never to return a cached response, but always fetch the query from the origin.

"extensions": {
  "cacheControl": {
    "version": 1,
    "noCache": true
  }
}

Programmatic API

It can be used within resolvers using the info.cacheControl.setCacheHint API programmatically, either in addition to existing cacheControl directives, or standalone.

const resolvers = {
  Query: {
    post: (_, { id }, _, info) => {
      info.cacheControl.setCacheHint({ maxAge: 30 })
      // info.cacheControl.setCacheHint({ maxAge: 30, scope: 'PRIVATE' })
      return post
    }
  }
}

Setting a default maxAge

By default, root fields (ie, fields on Query and Mutation) and fields returning object and interface types are considered to have a maxAge of 0 (ie, uncacheable) if they don't have a static or dynamic cache hint. (Non-root scalar fields inherit their cacheability from their parent, so that in the common case of an object type with a bunch of strings and numbers which all have the same cacheability, you just need to declare the hint on the object type.)

const config = {
  // ...
  cacheControl: {
    defaultMaxAge: 5,
  },
  // ...
}

The overall cache policy

If the overall cache policy has a non-zero maxAge, its scope is PRIVATE if any hints have scope PRIVATE, and PUBLIC otherwise.

Behaviour within a GraphQL Federation

The federation gateway checks each of the responses from downstream services for the Cache-Control header and settles for the lowest number of seconds given & applies it to its own, accumulated, response. If one of the downstream responses doesn't contain a Cache-Control header, none will be send by the federation gateway.

The contents of the extensions property will be merged & included in the federated response as well.

Note: Not part of this proposal, but if a cache with a ttl at the federation gateway would be in place, the gateway would not need to make the response to the downstream service at all, but instead could reliably serve the data from its own cache.

Last-Modified

Support for Last-Modified Headers is not implemented in Apollo, but I believe it could be beneficial, to implement it with a similar strategy like Cache-Control

Response Format

The response format would not utilize the extension field, it's solely purpose would be to generate a header for the original response & answer sub-sequent requests with an empty response body & a Status code: 304 if the criteria of the request headers are met.

Note: this is only valid when issuing a GET request & can't be used with POST requests

Request Format

No special request format in form of extensions or alike is needed, as the client can control the behaviour by utilizing request headers.

The lastModified directive can only be applied to type declarations in the schema.

type Post @lastModified(field: "updatedAt") @requires(fields: "updatedAt") {
  id: Int!
  votes: Int
  updatedAt: String!
}

type Query {
  latestPost: Post @cacheControl(maxAge: 10)
}
query {
  latestPost {
    id
    votes
    updatedAt
  }
}
{
  "data": {
    "post": {
      "id": 1,
      "votes": 217,
      "updatedAt": "Wed, 21 Oct 2015 07:28:00 GMT"
    }
  },
  "errors": null
}

In this example, the Last-Modified header would look like the following: Last-Modified: Wed, 21 Oct 2015 07:28:00 GMT

In case we have conflicting values, the date closest to the current date will be chosen. If one of the dates given does not adhere to the headers date format, no header will be send with the response.

type User {
  name: String!
}

type Comment @lastModified(field: "createdAt") @requires(fields: "createdAt") {
  text: String!
  createdAt: String!
}

type Post @lastModified(field: "updatedAt") @requires(fields: "updatedAt") {
  id: Int!
  votes: Int
  updatedAt: String!
  comments: [Comment]
  author: User
}

type Query {
  latestPost: Post
}

So, given the following query:

query {
  latestPost {
    id
    updatedAt
    comments {
      text
      createdAt
    }
  }
}

with the response

{
  "data": {
    "post": {
      "id": 1,
      "updatedAt": "Wed, 21 Oct 2015 07:28:00 GMT"
      "comments": [
        { "text": "A comment", "createdAt": "Mon, 11 Oct 2018 08:58:00 GMT" },
        { "text": "Another comment", "createdAt": "Wed, 22 Sep 2019 09:33:00 GMT" },
      ]
    }
  },
  "errors": null
}

the Last-Modified header would look like the following: Last-Modified: Wed, 22 Sep 2019 09:33:00 GMT

If the author would be requested alongside the request

query {
  latestPost {
    id
    updatedAt
    author {
      name
    }
    comments {
      text
      createdAt
    }
  }
}

no header would be generated as the User type hasn't been annotated with a lastModified directive (given no value has been set programmatically)

Programmatic API

It can be used within resolvers using the info.cacheControl.setLastModified API programmatically, either in addition to existing lastModified directives, or standalone.

const resolvers = {
  Query: {
    post: (_, { id }, _, info) => {
      info.cacheControl.setCacheHint({ lastModified: "Wed, 21 Oct 2015 07:28:00 GMT" })
      return post
    }
  }
}

Behaviour within a GraphQL Federation

The federation gateway checks each of the responses from downstream services for the Last-Modified header and settles for the one closest to the current date & applies it to its own, accumulated, response. If one of the downstream responses doesn't contain a Last-Modiefied header, none will be send by the federation gateway.

Note: Not part of this proposal, but if a cache at the federation gateway would be in place, also downstream servers could send 304 responses with an empty body, forcing the federation gateway to take the response from its own cache, rather than transmitting the response over the wire.

Proposed configuration

I propose the following configuration:

// just an example, not the proposed default values
{
  cacheControl: {
    defaultMaxAge: 0,
    extensions: false,
    cacheControlHeader: true,
    lastModifiedHeader: true,
    cacheControlDirective: true,
    lastModifiedDirective: true,
  },
}

cacheControl Configuration root, set to false by default. Any value aside from an object with a fitting sub-configuration or true, will be treated as false. Caching is not enabled by default. If set to true, it will use the defaults of the sub-configuration listed below.

cacheControl.defaultMaxAge Can define the maximum number of seconds of the max-age property of the Cache-Control header and extensions. For example, if your schema defines @cacheControl(maxAge: 1000) & the configuration option is set to 30, the response will only contain a header (and extensions property) with a max-age of 30.
The default value is 0 and will be treated as "no limit". Anything other than a positive integer value > 0 will be treated as "no limit" as well.

cacheControl.extensions Takes a boolean, if set to true, the extensions property will be included in the response, if set to anything other than strict boolean true, no extensions property will be included. Set to false by default.

cacheControl.cacheControlHeader Takes a boolean, if set to false, no Cache-Control header will be attached to the response. (The extensions property will be send, if configured) true by default.

cacheControl.lastModifiedHeader Takes a boolean, if set to false, no Last-Modified header will be attached to the response. true by default.

cacheControl.cacheControlDirective Takes either a boolean value or a string, all other types default to false. In order to use the cacheControl directive, a definition of it (and the corresponding enum for the scope) must be inserted into the schema. If one would only use the programmatic API, they could supply false to the configuration, which will NOT automatically insert the directive into the schema. In order to avoid conflicts with existing directives or types, a string can be supplied to the configuration (f.e. httpCacheControl), then the directive & the enum will be available under this specified name. The enum will always be postfixed with Scope. Defaults to true.

enum CacheControlScope { public private }
directive @cacheControl(maxAge: Int, scope: CacheControlScope) on OBJECT | FIELD_DEFINITION

cacheControl.lastModifiedDirective Takes either a boolean value or a string, all other types default to false. In order to use the lastModified directive, a definition of it must be inserted into the schema. If one would only use the programmatic API, they could supply false to the configuration, which will NOT automatically insert the directive into the schema. In order to avoid conflicts with existing directives or types, a string can be supplied to the configuration (f.e. httpLastModified), then the directive will be available under this specified name. Defaults to true.

directive @lastModified(field: String) on OBJECT

What about ETags

ETags are deliberately left out of this, as you could make use of them already today & are taking the whole response object in consideration. So there is just no point in adding any special directives or similar to the system.

Feedback

Please let me know your thoughts, and also if you spend some time & effort implementing something in that direction as well. I'm happy for anyone who wants to join the effort and/or gives constructive feedback.

mcollina commented 3 years ago

This would be amazing to add!

kuznetsov-online commented 3 years ago

@asciidisco any news, please? It's a very needed feature.

asciidisco commented 3 years ago

I‘ve just worked on it yesterday (had a very busy quarter at work that left me no space for any side quests) as I‘m on vacation right now. I wanted to finish the Cache Control feature before we celebrate new years. There are just a couple of unit tests left to be written.

kuznetsov-online commented 3 years ago

wow! good news!

kuznetsov-online commented 3 years ago

@asciidisco do you have progress with it?

kuznetsov-online commented 3 years ago

@asciidisco any news, please?