vendure-ecommerce / vendure

The commerce platform with customization in its DNA.
https://www.vendure.io
Other
5.81k stars 1.03k forks source link

Introduce a CacheStrategy for multi-instance data caching #3043

Closed michaelbromley closed 4 weeks ago

michaelbromley commented 2 months ago

The Problem

There are several places where Vendure core makes use of caching in order to significantly improve performance:

The issue with those solutions is that they are in-memory-only, and therefore local to the specific server/worker instance.

Being in-memory has two major downsides:

  1. Duplication: each instance of the server & worker maintains its own cache. On a typical deployment with 2x server & 2x worker processes, this means 4x the work needs to be done to fill up 4 caches. Load-balancing means that they will be filled unevenly, which is inefficient.
  2. Invalidation: we cannot implement an invalidation method which works across multiple instances. E.g. the TaxRate cache: we would like to keep it cached forever until a TaxRate is changed/added/removed. We could naively set up an EventBus subscriber to clear the cache when we detect a change. However, the change event will only be published on one instance. Any other instances will not know about that event, and their cache is now stale.

This second point is the reason we use TTLs with relatively short times. Again, leading to more work being done on the database.

It is also the reason we do not use caching in other scenarios that could radically improve performance: we cannot reliably invalidate a cache that is not shared by all instances.

Example

This issue was originally motivated by an investigation I am conducting into the performance of the order-related mutations. Using a prototype of this caching approach, I was able to speed up my benchmark by ~2.5x and cut the p(95) response time from 6.98s to 3s.

Proposed Solution

I propose introducing a shared caching mechanism into the core: CacheStrategy. This will be strategy-based allowing you to decide whether you want to store that cache in:

The CacheStrategy would replace all existing caching mechanisms mentioned above, and would unlock the opportunity to make huge performance gains in currently slow areas like:

Because the cache is shared, it means as soon as one instance has cached a value, it will be available to all instances.

Design

At the most basic, the CacheStrategy will implement the typical cache methods: get() add(), delete().

It should also support key eviction via TTL which would be configurable per key.

It should be able to store JSON-like data, i.e. any serializable JS data structure, just like we already support with the job queue.

Here's a sketch of how it would look:

export interface CacheStrategy extends InjectableStrategy {

  get<T>(key: string): Promise<T>;

  add<T>(key: string, value: JsonCompatible<T>, options: { ttl?: number }): Promise<void>

  delete(key: string): Promise<boolean>

  // We could also include convenience methods to replicate the 
  // functionality of the SelfRefreshingCache interface.

}

Backward Compatibility

The implementation of CacheStrategy needs to be done in a backward-compatible way, so no changes are needed by the user when upgrading.

Summary

This proposal has the following benefits:

dlhck commented 2 months ago

We should add the support for cache tags. In many scenarios you want to delete items in the cache for a certain namespace, e.g. delete all cached values for a product or a zone. Pimcore has a neat implementation, where we can get some inspiration from: https://pimcore.com/docs/platform/Pimcore/Development_Tools_and_Details/Cache/#overview-of-functionalities

I would also recommend that we take a look at the caching architecture of Symfony as it is a really sophisticated one: https://symfony.com/doc/current/components/cache.html#generic-caching-psr-6

michaelbromley commented 2 months ago

Cache Tags

Tags are a mechanism of grouping cache items in order to make it possible to invalidate all items based on tags.

Prior Art

Symfony

https://symfony.com/doc/current/components/cache/cache_invalidation.html#using-cache-tags

// invalidate all items related to `tag_1` or `tag_3`
$cache->invalidateTags(['tag_1', 'tag_3']);

// if you know the cache key, you can also delete the item directly
$cache->delete('cache_key');

In the Symphony (& PSR-6 in general) implementation, cache items are wrapped into a CacheItem class, which also allows tags to be set on the item:

// add one or more tags
$item->tag('tag_1');
$item->tag(['tag_2', 'tag_3']);

Laravel

https://laravel.com/docs/11.x/cache

Laravel had a tags implementation but it was recently removed (at least from the documentation):

It looks like their use of tags was badly designed - you can only invalidate by tags when the array of tags exactly matches. explanation

Drupal

https://www.drupal.org/docs/drupal-apis/cache-api/cache-tags

Any cache backend should implement CacheBackendInterface, so when you set a cache item with the ::set() method, provide third and fourth arguments e.g:

$cache_backend->set(
  $cid, $data, Cache::PERMANENT, ['node:5', 'user:7']
);

This stores a cache item with ID $cid permanently (i.e., stored indefinitely), but makes it susceptible to invalidation through either the node:5 or user:7 cache tags.

Redis-tag-cache

A package from Max Stoiber that implements a very simple (1 file) Redis cache with tags. We can use this as inspiration for our Redis version.

https://www.npmjs.com/package/redis-tag-cache

This implements the solution given in this SO answer using a separate list of keys for each tag and then smembers: https://stackoverflow.com/a/40649819/772859

Implementation

The consensus design is that any cache item can be tagged with one or more string tags. Later you can invalidate by tag and all entries that have that tag will be invalidated.

We need to have 3 concrete implementations:

The common structure for tags will be to have a separate data structure that stores the tag with a list of keys that have that tag.

For the in-memory store, this can be a Map<string, Set<string>> - a map with the tag as the key, and a set of corresponding cache keys as the value.

The redis implementation is similar, and can be seen in the Redis-tag-cache package above.

For the database store, we would need a separate table to store entries associating a tag with a single cache key:

CREATE TABLE cache_tags (
  id SERIAL PRIMARY KEY,
  tag VARCHAR(255) NOT NULL,  -- Tag name
  cache_key VARCHAR(255) NOT NULL,  -- Corresponding cache key
  FOREIGN KEY (cache_key) REFERENCES cache_items(cache_key) ON DELETE CASCADE
);