mercurius-js / cache

Adds an in-process caching layer to Mercurius. Federation is fully supported.
MIT License
106 stars 19 forks source link

mercurius-cache

Adds an in-process caching layer to Mercurius. Federation is fully supported.

Based on preliminary testing, it is possible to achieve a significant throughput improvement at the expense of the freshness of the data. Setting the ttl accordingly and/or a good invalidation strategy is of critical importance.

Under the covers, it uses async-cache-dedupe which will also deduplicate the calls.

Install

npm i fastify mercurius mercurius-cache graphql

Quickstart

'use strict'

const fastify = require('fastify')
const mercurius = require('mercurius')
const cache = require('mercurius-cache')

const app = fastify({ logger: true })

const schema = `
  type Query {
    add(x: Int, y: Int): Int
    hello: String
  }
`

const resolvers = {
  Query: {
    async add (_, { x, y }, { reply }) {
      reply.log.info('add called')
      for (let i = 0; i < 10000000; i++) {} // something that takes time
      return x + y
    }
  }
}

app.register(mercurius, {
  schema,
  resolvers
})

// cache query "add" responses for 10 seconds
app.register(cache, {
  ttl: 10,
  policy: {
    Query: {
      add: true
      // note: it cache "add" but it doesn't cache "hello"
    }
  }
})

app.listen(3000)

// Use the following to test
// curl -X POST -H 'content-type: application/json' -d '{ "query": "{ add(x: 2, y: 2) }" }' localhost:3000/graphql

Options

a number or a function that returns a number of the maximum time a cache entry can live in seconds; default is 0, which means that the cache is disabled. The ttl function reveives the result of the original function as the first argument.

Example(s)

  ttl: 10
  ttl: (result) => !!result.importantProp ? 10 : 0

the time in seconds after the ttl to serve stale data while the cache values are re-validated. Has no effect if ttl is not configured.

Example

  stale: 5

use the cache in all resolvers; default is false. Use either policy or all but not both.
Example

  all: true

default cache is in memory, but a redis storage can be used for a larger and shared cache.
Storage options are:

See https://github.com/mercurius-js/mercurius-cache-example for a complete complex use case.

specify queries to cache; default is empty.
Set it to true to cache using main ttl and stale if configured. Example

  policy: {
    Query: {
      add: true
    }
  }

use a specific ttl for the policy, instead of the main one.
Example

  ttl: 10,
  policy: {
    Query: {
      welcome: {
        ttl: 5 // Query "welcome" will be cached for 5 seconds
      },
      bye: true, // Query "bye" will be cached for 10 seconds
      hello: (result) => result.shouldCache ? 15 : 0 // function that determines the ttl for how long the item should be cached
    }
  }

use a specific stale value for the policy, instead of the main one.
Example

  ttl: 10,
  stale: 10,
  policy: {
    Query: {
      welcome: {
        ttl: 5 // Query "welcome" will be cached for 5 seconds
        stale: 5 // Query "welcome" will available for 5 seconds after the ttl has expired
      },
      bye: true // Query "bye" will be cached for 10 seconds and available for 10 seconds after the ttl is expired
    }
  }

use specific storage for the policy, instead of the main one.
Can be useful to have, for example, in-memory storage for small data set along with the redis storage.
See https://github.com/mercurius-js/mercurius-cache-example for a complete complex use case.
Example

  storage: {
    type: 'redis',
    options: { client: new Redis() }
  },
  policy: {
    Query: {
      countries: {
        ttl: 86400, // Query "countries" will be cached for 1 day
        storage: { type: 'memory' }
      }
    }
  }

skip cache use for a specific condition, onSkip will be triggered.
Example

  skip (self, arg, ctx, info) {
    if (ctx.reply.request.headers.authorization) {
      return true
    }
    return false
  }

To improve performance, we can define a custom key serializer. Example

  const schema = `
  type Query {
    getUser (id: ID!): User
  }`

  // ...

  policy: {
    Query: {
      getUser: { key ({ self, arg, info, ctx, fields }) { return `${arg.id}` } }
    }
  }

Please note that the key function must return a string, otherwise the result will be stringified, losing the performance advantage of custom serialization.

extend the key to cache responses by different requests, for example, to enable custom cache per user.
See examples/cache-per-user.js. Example

  policy: {
    Query: {
      welcome: {
        extendKey: function (source, args, context, info) {
          return context.userId ? `user:${context.userId}` : undefined
        }
      }
    }
  }

function to set the references for the query, see invalidation to know how to use references, and https://github.com/mercurius-js/mercurius-cache-example for a complete use case.
Example

  policy: {
    Query: {
      user: {
        references: ({source, args, context, info}, key, result) => {
          if(!result) { return }
          return [`user:${result.id}`]
        }
      },
      users: {
        references: ({source, args, context, info}, key, result) => {
          if(!result) { return }
          const references = result.map(user => (`user:${user.id}`))
          references.push('users')
          return references
        }
      }
    }
  }

function to invalidate for the query by references, see invalidation to know how to use references, and https://github.com/mercurius-js/mercurius-cache-example for a complete use case.
invalidate function can be sync or async. Example

  policy: {
    Mutation: {
      addUser: {
        invalidate: (self, arg, ctx, info, result) => ['users']
      }
    }
  }

should be used in case of conflicts with nested fields with the same name as policy fields (ttl, skip, storage....).
Example

policy: {
    Query: {
      welcome: {
        // no __options key present, so policy options are considered as it is
        ttl: 6
      },
      hello: {
        // since "hello" query has a ttl property
        __options: {
          ttl: 6
        },
        ttl: {
          // here we can use both __options or list policy options
          skip: () { /* .. */ }
        }
      }
    }
}

skip cache use for a specific condition, onSkip will be triggered.
Example

  skip (self, arg, ctx, info) {
    if (ctx.reply.request.headers.authorization) {
      return true
    }
    return false
  }

called when a request is deduped. When multiple requests arrive at the same time, the dedupe system calls the resolver only once and serve all the request with the result of the first request - and after the result is cached.
Example

  onDedupe (type, fieldName) {
    console.log(`dedupe ${type} ${fieldName}`) 
  }

called when a cached value is returned.
Example

  onHit (type, fieldName) {
    console.log(`hit ${type} ${fieldName}`) 
  }

called when there is no value in the cache; it is not called if a resolver is skipped.
Example

  onMiss (type, fieldName) {
    console.log(`miss ${type} ${fieldName}`)
  }

called when the resolver is skipped, both by skip or policy.skip. Example

  onSkip (type, fieldName) {
    console.log(`skip ${type} ${fieldName}`)
  }

called when an error occurred on the caching operation. Example

  onError (type, fieldName, error) {
    console.error(`error on ${type} ${fieldName}`, error)
  }

This option enables cache report with hit/miss/dedupes/skips count for all queries specified in the policy; default is disabled. The value of the interval is in seconds.

Example

  logInterval: 3

custom function for logging cache hits/misses. called every logInterval seconds when the cache report is logged.

Example

  logReport (report) {
    console.log('Periodic cache report')
    console.table(report)
  }

// console table output

┌───────────────┬─────────┬──────┬────────┬───────┐
│     (index)   │ dedupes │ hits │ misses │ skips │
├───────────────┼─────────┼──────┼────────┼───────┤
│   Query.add   │    0    │  8   │   1    │   0   │
│   Query.sub   │    0    │  2   │   6    │   0   │
└───────────────┴─────────┴──────┴────────┴───────┘

// report format
{
  "Query.add": {
    "dedupes": 0,
    "hits": 8,
    "misses": 1,
    "skips": 0
  },
  "Query.sub": {
    "dedupes": 0,
    "hits": 2,
    "misses": 6,
    "skips": 0
  },
}

Methods

cache.invalidate(references, [storage])

cache.invalidate perform invalidation over the whole storage.
To specify the storage to operate invalidation, it needs to be the name of a policy, for example Query.getUser.
Note that invalidation must be enabled on storage.

references can be:

Example

const app = fastify()

await app.register(cache, {
  ttl: 60,
  storage: {
    type: 'redis',
    options: { client: redisClient, invalidation: true    }
  },
  policy: { 
    Query: {
      getUser: {
        references: (args, key, result) => result ? [`user:${result.id}`] : null
      }
    }
  }
})

// ...

// invalidate all users
await app.graphql.cache.invalidate('user:*')

// invalidate user 1
await app.graphql.cache.invalidate('user:1')

// invalidate user 1 and user 2
await app.graphql.cache.invalidate(['user:1', 'user:2'])

See example for a complete example.

clear method allows to pragmatically clear the cache entries, for example

const app = fastify()

await app.register(cache, {
  ttl: 60,
  policy: { 
    // ...
  }
})

// ...

await app.graphql.cache.clear()

Invalidation

Along with time to live invalidation of the cache entries, we can use invalidation by keys.
The concept behind invalidation by keys is that entries have an auxiliary key set that explicitly links requests along with their result. These auxiliary keys are called here references.
The use case is common. Let's say we have an entry user {id: 1, name: "Alice"}, it may change often or rarely, the ttl system is not accurate:

To solve this common problem, we can use references.
We can say that the result of query getUser(id: 1) has reference user~1, and the result of query findUsers, containing {id: 1, name: "Alice"},{id: 2, name: "Bob"} has references [user~1,user~2]. So we can find the results in the cache by their references, independently of the request that generated them, and we can invalidate by references.

When the mutation updateUser involves user {id: 1} we can remove all the entries in the cache that have references to user~1, so the result of getUser(id: 1) and findUsers, and they will be reloaded at the next request with the new data - but not the result of getUser(id: 2).

However, the operations required to do that could be expensive and not worthing it, for example, is not recommendable to cache frequently updating data by queries of find that have pagination/filtering/sorting.

Explicit invalidation is disabled by default, you have to enable in storage settings.

See mercurius-cache-example for a complete example.

Redis

Using a redis storage is the best choice for a shared cache for a cluster of a service instance.
However, using the invalidation system need to keep references updated, and remove the expired ones: while expired references do not compromise the cache integrity, they slow down the I/O operations.

So, redis storage has the gc function, to perform garbage collection.

See this example in mercurius-cache-example/plugins/cache.js about how to run gc on a single instance service.

Another example:

const { createStorage } = require('async-cache-dedupe')
const client = new Redis(connection)

const storage = createStorage('redis', { log, client, invalidation: true })

// run in lazy mode, doing a full db iteration / but not a full clean up
let cursor = 0
do {
  const report = await storage.gc('lazy', { lazy: { chunk: 200, cursor } })
  cursor = report.cursor
} while (cursor !== 0)

// run in strict mode
const report = await storage.gc('strict', { chunk: 250 })

In lazy mode, only options.max references are scanned every time, picking keys to check randomly; this operation is lighter while does not ensure references full clean up

In strict mode, all references and keys are checked and cleaned; this operation scans the whole db and is slow, while it ensures full references clean up.

gc options are:

storage.gc function returns the report of the job, like

"report":{
  "references":{
      "scanned":["r:user:8", "r:group:11", "r:group:16"],
      "removed":["r:user:8", "r:group:16"]
  },
  "keys":{
      "scanned":["users~1"],
      "removed":["users~1"]
  },
  "loops":4,
  "cursor":0,
  "error":null
}

An effective strategy is to run often lazy cleans and a strict clean sometimes.
The report contains useful information about the gc cycle, use them to adjust params of the gc utility, settings depending on the size, and the mutability of cached data.

A way is to run it programmatically, as in https://github.com/mercurius-js/mercurius-cache-example or set up cronjobs as described in examples/redis-gc - this one is useful when there are many instances of the mercurius server.
See async-cache-dedupe#redis-garbage-collector for details.

Breaking Changes

Benchmarks

We have experienced up to 10x performance improvements in real-world scenarios. This repository also includes a benchmark of a gateway and two federated services that shows that adding a cache with 10ms TTL can improve the performance by 4x:

$ sh bench.sh
===============================
= Gateway Mode (not cache)    =
===============================
Running 10s test @ http://localhost:3000/graphql
100 connections

┌─────────┬───────┬───────┬───────┬───────┬──────────┬─────────┬────────┐
│ Stat    │ 2.5%  │ 50%   │ 97.5% │ 99%   │ Avg      │ Stdev   │ Max    │
├─────────┼───────┼───────┼───────┼───────┼──────────┼─────────┼────────┤
│ Latency │ 28 ms │ 31 ms │ 57 ms │ 86 ms │ 33.47 ms │ 12.2 ms │ 238 ms │
└─────────┴───────┴───────┴───────┴───────┴──────────┴─────────┴────────┘
┌───────────┬────────┬────────┬─────────┬─────────┬─────────┬────────┬────────┐
│ Stat      │ 1%     │ 2.5%   │ 50%     │ 97.5%   │ Avg     │ Stdev  │ Min    │
├───────────┼────────┼────────┼─────────┼─────────┼─────────┼────────┼────────┤
│ Req/Sec   │ 1291   │ 1291   │ 3201    │ 3347    │ 2942.1  │ 559.51 │ 1291   │
├───────────┼────────┼────────┼─────────┼─────────┼─────────┼────────┼────────┤
│ Bytes/Sec │ 452 kB │ 452 kB │ 1.12 MB │ 1.17 MB │ 1.03 MB │ 196 kB │ 452 kB │
└───────────┴────────┴────────┴─────────┴─────────┴─────────┴────────┴────────┘

Req/Bytes counts sampled once per second.

32k requests in 11.03s, 11.3 MB read

===============================
= Gateway Mode (0s TTL)       =
===============================
Running 10s test @ http://localhost:3000/graphql
100 connections

┌─────────┬──────┬──────┬───────┬───────┬─────────┬─────────┬────────┐
│ Stat    │ 2.5% │ 50%  │ 97.5% │ 99%   │ Avg     │ Stdev   │ Max    │
├─────────┼──────┼──────┼───────┼───────┼─────────┼─────────┼────────┤
│ Latency │ 6 ms │ 7 ms │ 12 ms │ 17 ms │ 7.29 ms │ 3.32 ms │ 125 ms │
└─────────┴──────┴──────┴───────┴───────┴─────────┴─────────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg     │ Stdev   │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Req/Sec   │ 7403    │ 7403    │ 13359   │ 13751   │ 12759   │ 1831.94 │ 7400    │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes/Sec │ 2.59 MB │ 2.59 MB │ 4.68 MB │ 4.81 MB │ 4.47 MB │ 642 kB  │ 2.59 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

Req/Bytes counts sampled once per second.

128k requests in 10.03s, 44.7 MB read

===============================
= Gateway Mode (1s TTL)       =
===============================
Running 10s test @ http://localhost:3000/graphql
100 connections

┌─────────┬──────┬──────┬───────┬───────┬─────────┬─────────┬────────┐
│ Stat    │ 2.5% │ 50%  │ 97.5% │ 99%   │ Avg     │ Stdev   │ Max    │
├─────────┼──────┼──────┼───────┼───────┼─────────┼─────────┼────────┤
│ Latency │ 7 ms │ 7 ms │ 13 ms │ 19 ms │ 7.68 ms │ 4.01 ms │ 149 ms │
└─────────┴──────┴──────┴───────┴───────┴─────────┴─────────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg     │ Stdev   │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Req/Sec   │ 6735    │ 6735    │ 12879   │ 12951   │ 12173   │ 1828.86 │ 6735    │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes/Sec │ 2.36 MB │ 2.36 MB │ 4.51 MB │ 4.53 MB │ 4.26 MB │ 640 kB  │ 2.36 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

Req/Bytes counts sampled once per second.

122k requests in 10.03s, 42.6 MB read

===============================
= Gateway Mode (10s TTL)      =
===============================
Running 10s test @ http://localhost:3000/graphql
100 connections

┌─────────┬──────┬──────┬───────┬───────┬─────────┬─────────┬────────┐
│ Stat    │ 2.5% │ 50%  │ 97.5% │ 99%   │ Avg     │ Stdev   │ Max    │
├─────────┼──────┼──────┼───────┼───────┼─────────┼─────────┼────────┤
│ Latency │ 7 ms │ 7 ms │ 13 ms │ 18 ms │ 7.51 ms │ 3.22 ms │ 121 ms │
└─────────┴──────┴──────┴───────┴───────┴─────────┴─────────┴────────┘
┌───────────┬────────┬────────┬─────────┬─────────┬─────────┬─────────┬────────┐
│ Stat      │ 1%     │ 2.5%   │ 50%     │ 97.5%   │ Avg     │ Stdev   │ Min    │
├───────────┼────────┼────────┼─────────┼─────────┼─────────┼─────────┼────────┤
│ Req/Sec   │ 7147   │ 7147   │ 13231   │ 13303   │ 12498.2 │ 1807.01 │ 7144   │
├───────────┼────────┼────────┼─────────┼─────────┼─────────┼─────────┼────────┤
│ Bytes/Sec │ 2.5 MB │ 2.5 MB │ 4.63 MB │ 4.66 MB │ 4.37 MB │ 633 kB  │ 2.5 MB │
└───────────┴────────┴────────┴─────────┴─────────┴─────────┴─────────┴────────┘

Req/Bytes counts sampled once per second.

125k requests in 10.03s, 43.7 MB read

More info about how this plugin works

This plugin caches the result of the resolver, but if the resolver returns a type incompatible with the schema return type, the plugin will cache the invalid return value. When you call the resolver again, the plugin will return the cached value, thereby caching the validation error.

This issue may be exacerbated in a federation setup when you don't have full control over the implementation of federated schema and resolvers.

Here you can find an example of the problem.

'use strict'

const fastify = require('fastify')
const mercurius = require('mercurius')
const cache = require('mercurius-cache')

const app = fastify({ logger: true })

const schema = `
  type Query {
    getNumber: Int
  }
`

const resolvers = {
  Query: {
    async getNumber(_, __, { reply }) {
      return "hello";
    }
  }
}

app.register(mercurius, {
  schema,
  resolvers
})

app.register(cache, {
  ttl: 10,
  policy: {
    Query: {
      getNumber: true
    }
  }
})

If you come across this problem, you will first need to fix your code. Then you have two options:

  1. If you are you using an in-memory cache, it will be cleared at the next start of the application, so the impact of this issue will be limited
  2. If you are you using the Redis cache, you will need to manually invalidate the cache in Redis or wait for the TTL to expire

License

MIT