mcollina / async-cache-dedupe

Async cache with dedupe support
MIT License
612 stars 39 forks source link

async-cache-dedupe

async-cache-dedupe is a cache for asynchronous fetching of resources with full deduplication, i.e. the same resource is only asked once at any given time.

Install

npm i async-cache-dedupe

Example

import { createCache } from 'async-cache-dedupe'

const cache = createCache({
  ttl: 5, // seconds
  stale: 5, // number of seconds to return data after ttl has expired
  storage: { type: 'memory' },
})

cache.define('fetchSomething', async (k) => {
  console.log('query', k)
  // query 42
  // query 24

  return { k }
})

const p1 = cache.fetchSomething(42)
const p2 = cache.fetchSomething(24)
const p3 = cache.fetchSomething(42)

const res = await Promise.all([p1, p2, p3])

console.log(res)
// [
//   { k: 42 },
//   { k: 24 }
//   { k: 42 }
// ]

Commonjs/require is also supported.

API

createCache(opts)

Creates a new cache.

Options:

cache.define(name[, opts], original(arg, cacheKey))

Define a new function to cache of the given name.

The define method adds a cache[name] function that will call the original function if the result is not present in the cache. The cache key for arg is computed using safe-stable-stringify and it is passed as the cacheKey argument to the original function.

Options:

cache.clear([name], [arg])

Clear the cache. If name is specified, all the cache entries from the function defined with that name are cleared. If arg is specified, only the elements cached with the given name and arg are cleared.

cache.invalidateAll(references, [storage])

cache.invalidateAll perform invalidation over the whole storage; if storage is not specified - using the same name as the defined function, invalidation is made over the default storage.

references can be:

Example

const cache = createCache({ ttl: 60 })

cache.define('fetchUser', {
  references: (args, key, result) => result ? [`user:${result.id}`] : null
}, (id) => database.find({ table: 'users', where: { id }}))

cache.define('fetchCountries', {
  storage: { type: 'memory', size: 256 },
  references: (args, key, result) => [`countries`]
}, (id) => database.find({ table: 'countries' }))

// ...

// invalidate all users from default storage
cache.invalidateAll('user:*')

// invalidate user 1 from default storage
cache.invalidateAll('user:1')

// invalidate user 1 and user 2 from default storage
cache.invalidateAll(['user:1', 'user:2'])

// note "fetchCountries" uses a different storage
cache.invalidateAll('countries', 'fetchCountries')

See below how invalidation and references work.

Invalidation

Along with time to live invalidation of the cache entries, we can use invalidation by keys.
The concept behind invalidation by keys is that entries have an auxiliary key set that explicitly links requests along with their own result. These auxiliary keys are called here references.
A scenario. Let's say we have an entry user {id: 1, name: "Alice"}, it may change often or rarely, the ttl system is not accurate:

To solve this common problem, we can use references.
We can say that the result of defined function getUser(id: 1) has reference user~1, and the result of defined function findUsers, containing {id: 1, name: "Alice"},{id: 2, name: "Bob"} has references [user~1,user~2]. So we can find the results in the cache by their references, independently of the request that generated them, and we can invalidate by references.

So, when a writing event involving user {id: 1} happens (usually an update), we can remove all the entries in the cache that have references to user~1, so the result of getUser(id: 1) and findUsers, and they will be reloaded at the next request with the new data - but not the result of getUser(id: 2).

Explicit invalidation is disabled by default, you have to enable it in storage settings.

See mercurius-cache-example for a complete example.

Redis

Using a redis storage is the best choice for a shared and/or large cache.
All the references entries in redis have referencesTTL, so they are all cleaned at some time. referencesTTL value should be set at the maximum of all the ttls, to let them be available for every cache entry, but at the same time, they expire, avoiding data leaking.
Anyway, we should keep references up-to-date to be more efficient on writes and invalidation, using the garbage collector function, that prunes the expired references: while expired references do not compromise the cache integrity, they slow down the I/O operations.
Storage memory doesn't have gc.

Redis garbage collector

As said, While the garbage collector is optional, is highly recommended to keep references up to date and improve performances on setting cache entries and invalidation of them.

storage.gc([mode], [options])

Options:

Return report of the gc job, as follows

"report":{
  "references":{
      "scanned":["r:user:8", "r:group:11", "r:group:16"],
      "removed":["r:user:8", "r:group:16"]
  },
  "keys":{
      "scanned":["users~1"],
      "removed":["users~1"]
  },
  "loops":4,
  "cursor":0,
  "error":null
}

Example

import { createCache, createStorage } from 'async-cache-dedupe'

const cache = createCache({
  ttl: 5,
  storage: { type: 'redis', options: { client: redisClient, invalidation: true } },
})
// ... cache.define('fetchSomething'

const storage = createStorage('redis', { client: redisClient, invalidation: true })

let cursor
setInterval(() => {
  const report = await storage.gc('lazy', { lazy: { cursor } })
  if(report.error) {
    console.error('error on redis gc', error)
    return
  }
  console.log('gc report (lazy)', report)
  cursor = report.cursor
}, 60e3).unref()

setInterval(() => {
  const report = await storage.gc('strict', { chunk: 128 })
  if(report.error) {
    console.error('error on redis gc', error)
    return
  }
  console.log('gc report (strict)', report)
}, 10 * 60e3).unref()

TypeScript

This module provides a basic type definition for TypeScript.
As the library does some meta-programming and magic stuff behind the scenes, your compiler could yell at you when defining functions using the define property.
To avoid this, chain all defined functions in a single invocation:

import { createCache, Cache } from "async-cache-dedupe";

const fetchSomething = async (k: any) => {
  console.log("query", k);
  return { k };
};

const cache = createCache({
  ttl: 5, // seconds
  storage: { type: "memory" },
});

const cacheInstance = cache
  .define("fetchSomething", fetchSomething)
  .define("fetchSomethingElse", fetchSomething);

const p1 = cacheInstance.fetchSomething(42); // <--- TypeScript doesn't argue anymore here!
const p2 = cacheInstance.fetchSomethingElse(42); // <--- TypeScript doesn't argue anymore here!

Browser

All the major browser are supported; only memory storage type is supported, redis storage can't be used in a browser env.

This is a very simple example of how to use this module in a browser environment:

<script src="https://unpkg.com/async-cache-dedupe"></script>

<script>
  const cache = asyncCacheDedupe.createCache({
    ttl: 5, // seconds
    storage: { type: 'memory' },
  })

  cache.define('fetchSomething', async (k) => {
    console.log('query', k)
    return { k }
  })

  const p1 = cache.fetchSomething(42)
  const p2 = cache.fetchSomething(42)
  const p3 = cache.fetchSomething(42)

  Promise.all([p1, p2, p3]).then((values) => {
    console.log(values)
  })
</script>

You can also use the module with a bundler. The supported bundlers are webpack, rollup, esbuild and browserify.


Maintainers


Breaking Changes

License

MIT