tc39 / proposal-policy-map-set

A TC39 proposal for Maps and Sets with cache replacement policies like LRU and LFU.
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

Custom policies and abstract classes #11

Open js-choi opened 2 years ago

js-choi commented 2 years ago

A problem with using enumerated string constants for policies (e.g., new CacheMap('lru', { maxSize: 256 }), as proposed in #8 and #10 is that it disallows custom policies that plug into the same engine-driven purging triggers (#7).

For example, if only 'lru' and 'lfu' are allowed as policy string parameters, then no userland implementation of alternative policies (like time-based expirations) can use #7’s engine-driven purging.

As an alternative, we could have generic CacheMap and CacheSet classes that are not limited to built-in policies specified by strings, and which developers can extend with inheritance or composition.

For example (with class inheritance):

// A userland class that wants to support the same engine-driven purging
// as the built-in CacheMap subclasses.
// This would not be possible if we had enumerated strings for policies.
class WeightCacheMap extends CacheMap {
  #getWeight, #weightMap;

  constructor ({ getWeight, ...otherOptions }) {
    super(otherOptions);
    if (!(getWeight instanceof Function)) { throw new TypeError; }
    this.#getWeight = getWeight;
    this.#weightMap = new Map;
  }

  set (key, value) {
    this.#weightMap.set(key, this.#getWeight(key, value));
    return super.set(key, value);
  }

  delete (key) {
    this.#weightMap.delete(key);
    return super.delete(key);
  }

  // CacheMap would use the nextToRemove method to determine
  // which of its entries will be removed.
  // The following naïve implementation of nextToRemove
  // returns the key with the smallest weight in this.#weightMap.
  nextToRemove () {
    return Array.from(super.keys())
      .reduce((prevKey, nextKey) =>
        this.#weightMap.get(prevKey) < this.#weightMap.get(nextKey) ?
          prevKey : nextKey);
  }
}

// The userland subclass supports the same options as CacheMap, like maxSize,
// plus a required getWeight option.
const cache = new WeightCacheMap({
  maxSize: 2,
  purgeable: true,
  getWeight (key, value) { return key * 5 + value * 7; },
});
cache.set(3, 3);
cache.set(1, 1);
// The following line causes the cache to reach its maximum size of 2.
// It would remove the [1, 1] entry, since it has the lowest weight (12).
cache.set(2, 2);

Alternatively (with class composition):

// A userland class that wants to support the same engine-driven purging
// as the built-in CacheMap subclasses.
// This would not be possible if we had enumerated strings for policies.
class WeightCacheMap {
  #cacheMap, #getWeight, #weightMap;

  constructor ({ getWeight, ...otherOptions }) {
    if (!(getWeight instanceof Function)) { throw new TypeError; }
    this.#getWeight = getWeight;

    const weightMap = new Map;
    this.#weightMap = weightMap;

    this.#cacheMap = new CacheMap({
      // The CacheMap constructor expects a nextToRemove function in its
      // options bag. The following naïve implementation of a nextToRemove
      // function returns the key with the smallest weight in weightMap.
      nextToRemove (cacheMap) {
        return Array.from(cacheMap.keys())
          .reduce((prevKey, nextKey) =>
            weightMap.get(prevKey) < weightMap.get(nextKey) ?
              prevKey : nextKey);
      },

      // The CacheMap constructor also needs a delete callback
      // that it calls when it purges any entry.
      delete (key) {
        weightMap.delete(key);
      },

      ...otherOptions,
    });
  }

  set (key, value) {
    this.#weightMap.set(key, this.#getWeight(key, value));
    return this.#cacheMap.set(key, value);
  }

  delete (key) {
    this.#weightMap.delete(key);
    return this.#cacheMap.delete(key);
  }

  keys () {
    return this.#cacheMap.keys();
  }

  values () {
    return this.#cacheMap.values();
  }

  entries () {
    return this.#cacheMap.entries();
  }

  [Symbol.iterator] () {
    return this.#cacheMap.entries();
  }

  forEach (...args) {
    return this.#cacheMap.entries(...args);
  }
}

// The userland class supports the same options as CacheMap, like maxSize,
// plus a required getWeight option.
const cache = new WeightCacheMap({
  maxSize: 2,
  purgeable: true,
  getWeight (key, value) { return key * 5 + value * 7; },
});
cache.set(3, 3);
cache.set(1, 1);
// The following line causes the cache to reach its maximum size of 2.
// It would remove the [1, 1] entry, since it has the lowest weight (12).
cache.set(2, 2);

…Lastly, we could eschew separate classes completely and always use the original CacheMap class customized with callback options:

function createWeightCacheMap ({ getWeight, ...otherOptions }) {
  const weightMap = new Map;
  const contentMap = new Map;
  return new CacheMap({
    has (key) {
      return contentMap.has(key);
    },
    get (key) {
      return contentMap.get(key);
    },
    set (key, value) {
      weightMap.set(key, getWeight(key, value));
      return contentMap.set(key, value);
    },
    delete (key) {
      weightMap.delete(key);
      return contentMap.delete(key);
    },
    keys () {
      return contentMap.keys();
    },
    values () {
      return contentMap.values();
    },
    entries () {
      return contentMap.entries();
    },
    nextToRemove (cacheMap) {
      return Array.from(cacheMap.keys())
        .reduce((prevKey, nextKey) =>
          weightMap.get(prevKey) < weightMap.get(nextKey) ?
            prevKey : nextKey);
    },
    ...otherOptions,
  });
}

// The function supports the same options as CacheMap, like maxSize,
// plus a required getWeight option.
// It creates a CacheMap, not a subclass.
const cache = createWeightCacheMap({
  maxSize: 2,
  purgeable: true,
  getWeight (key, value) { return key * 5 + value * 7; },
});
cache.set(3, 3);
cache.set(1, 1);
// The following line causes the cache to reach its maximum size of 2.
// It would remove the [1, 1] entry, since it has the lowest weight (12).
cache.set(2, 2);

This string-enum-policies vs. custom-policies problem is orthogonal to #1 and #5.

In the preceding alternative, CacheMap/CacheSet might or might not be subclasses of Map/Set, or they might even be Map/Set themselves – this is issue #1.

And built-in policies might be namespaced under (CacheMap.LRU, CacheMap.LFU, etc. – or Map.LRU, Map.LFU, etc.) or non-namespaced classes (LRUMap, LFUMap, etc.) – this is issue #5.

theScottyJam commented 2 years ago

None of these options feel great.

Let's consider the scenario where someone makes a userland LRU cache. Later on, language implementors decide to add an emplace() function that lets you insert and/or update an entry with one call. What happens to a codebase that tries to use the new emplace() function with the existing, userland LRU cache map? (we're assuming the LRU cache map hasn't been updated since the addition of emplace())

If the LRU cache was implemented via inheritance (option 1), then it will auto-inherit emplace(). Using this inherited emplace() function could potentially behave correctly on the surface, but in reality, the LRU cache is failing to notice that those updates are happening and will fail to remove cache entries at the correct time. That would be an annoying source of bugs.

In the composition scenario (option 2), it'll simply be annoying to use these custom map classes that don't provide the emplace() function. It means you can't just pass this into any function that expects a cache map instance, as the function you want to pass it into might be hoping that emplace exists(), and it doesn't yet.

The third option looks like it's headed the right direction, but most of it looks like it's just emulating inheritance with a different API, so it would suffer from the same issues as the first one.

If we take any of these routes, then adding new functions to a Map class will cause old cache-strategy implementations to not mix well with newer JavaScript features. What we really need is to provide a way for caching strategies to just hook into the life-cycle of a cache-map entry, letting the caching-strategy get notified when new entries are added or removed. Here's a concrete example of one way we could do this:

const timeoutStrategy = timeoutTime => ({
  // Called whenever something new is added to the map
  // The options bag contains the map being acted on,
  // the key/value pair that was added, and perhaps a convenient
  // ejectFromCache function which is the same as doing `map.delete(key)`.
  onEntryAdded({ map, key, value, ejectFromCache }) {
    const timeoutId = setTimeout(ejectFromCache, timeoutTime)
    // The optional returned function is auto-called when this cache-entry is removed.
    // It's primary purpose is for unsubscribing from event sources, but can
    // be used for other reasons as well.
    return function onEntryRemoved() {
      clearTimeout(timeoutId)
    }
  },
})

// I think it would be cool if the CacheMap took a list of strategies instead of a single one,
// so you could compose multiple strategies together from different locations.
new CacheMap([
  timeoutStrategy(60_000),
  /* ...other strategies as desired... */
])

What's nice about the above API is that it's fairly simple. In userland, if I want to, for example, cause items to be ejected from the cache when custom events happen, I can trivially set up a one-time use cache-map that subscribes and unsubscribes from an event source. For example:

const userSessionCache = new CacheMap([{
  onEntryAdded({ value: user, ejectFromCache }) {
    user.onLogout.addListener(ejectFromCache)
    return function onEntryRemoved() {
      user.onLogout.removeListener(ejectFromCache)
    }
  }
}])

Think about how much more complicated it would be to do something like this via inheritance, or other techniques that require overriding all of the set/delete functions. Also note how it's resilient to new Map functions - if the JavaScript language receives a new emplace() function, it just needs to make sure the emplace() implementation for CacheMap calls the onEntryAdded() and it's corresponding onEntryRemoved lifecycle functions at the right times. No cache-map-strategy implementors will need to worry about making updates to their implementations to support new EcmaScript features.

A couple other details to point out.

If you want to keep additional metadata about what's stored in the map (as is necessary for an LRU cache), you can just store that in local variables in your caching-strategy factory, or if you choose to implement your strategy as a class, just store it as private state. For example:

class LRUCacheStrategy {
  // Maps cache-map keys to metadata
  #metadata = new Map()

  constructor(maxSize) { ... }

  onEntryAdded({ key }) {
    this.#metadata.add(key, /* whatever you want to add */)
    ...
    return () => this.#metadata.delete(key)
  }
}

new CacheMap([ new LRUCacheStrategy(1_000) ]);

Also note that I'm only showcasing the onEntryAdded() hook, but there's certainly room for other, specialized hooks if needed. For example, for performance reasons we could choose to add a onClear() hook that, if provided, will get called instead of the the individual entry-specific cleanup functions when yourMap.clear() is called, thus letting you clean up resources in a much more efficient matter than would otherwise be possible.

(update: I was interchangeably using CachClass and CachMap for some reason. I updated everything to just be CacheMap)

js-choi commented 2 years ago

@theScottyJam: I agree that the customization API must be robust towards additions to the base class’s methods, like adding emplace. And you are indeed correct that we can use more abstract callbacks like onEntryAdded, as in your examples.

But is there a particular reason why the other designs cannot work with adding emplace? The default implementation of CacheMap.prototype.emplace could call this.get and this.set, which would still work with the other designs’ overriding of get and set. (Though this of course depends on the precedent set by how Map.prototype.emplace actually works; see tc39/proposal-upsert#47.)

Another note: Your example omits the nextToRemove callback, which is crucial for automated cache eviction in response to memory pressure (#7). You propose allowing caches to have multiple policies, but I am not sure how their nextToRemove methods would compose. Do you have any ideas on how they may compose?

theScottyJam commented 2 years ago

The default implementation of CacheMap.prototype.emplace could call this.get and this.set

I guess this could work. But, it doesn't feel very nice.

Firstly, I often tell people to avoid inheriting from built-in classes, precisely because of the issues I listed previously - it's not uncommon for JavaScript to add new functions to a class that could be conceptually incompatible with your pre-existing subclass. And it would be a breaking change for the subclass to later-on override the new method to behave differently. It would be a shame to have conflicting advice, where most of the time you're not supposed to subclass built-ins, but in the special case of CacheMap, the language encourages you to subclass it.

In general, I think subclassing gives the end-user way too much power. Do they really need to be able to customize every individual function implemented by CacheMap in a fine-grain way? No. And if we give them this power while promising to keep your subclasses "good", even as we add new methods, we're just going to paint ourselves in a corner. For example, I'd prefer to not give the end-user the power to customize the iteration order of a CacheMap - such power simply isn't necessary for making a custom cache-map. If we don't withhold this power from them, then we'd prevent ourselves from ever adding things like a double-ended iterator, as it's simply not possible to add that to CacheMaps without breaking subclasses.

Instead, providing a custom iteration order should be considered "extra functionality", and should be implemented as such. For example, going back to my example solution, instead of overriding .forEach() they can implement custom behaviors like this:

// The `LRUCacheStrategy` class implements static methods
// for custom behaviors.
LRUCacheStrategy.forEachInLastUsedOrder(
  aCacheMapThatUsesTheLRUCacheStrategy,
  element => { ... }
);

That's just one example way to do it (but it does require that the cacheMap passed in exposes the list of caching-strategies that were provided during construction, to allow the static method to find its own strategy instance and access private data). There's other options as well, and this should probably be something that's discussed in more detail, but what's important is that we don't have to ask them to override the existing .forEach() method to do these sorts of things.

There's also performance concerns. Imagine if, hypothetically, the CacheMap class started with just .keys() and .values(), and some subclasses were built to override those. Later on, we added .entries() and implemented it by calling the public .keys() in order to not break subclasses. Even later we add Symbol.iterator, and we call .entries() since that method has the closest behavior. That's quite the long chain of methods that now needs to be called when you use Symbol.iterator, and there's the unfortunate problem that Symbol.iterator isn't lazy like we'd like it to be, because it calls .entries() which returns a list.

(I'll put together my thoughts and address the issue about nextToRemove in a separate comment)

js-choi commented 2 years ago

For what it’s worth, I am sympathetic to arguments that inheritance is brittle and overly powerful. I actually do like your “single class receiving policy objects with a minimal set of callbacks” approach (e.g., new CacheMap({ onEntryAdded () { … }, … })). Of course it will be tricky to figure out precisely what those callbacks should be. I’m a little less convinced about being able to compose the policy objects, but I’ll see what you have to say about nextToRemove.

theScottyJam commented 2 years ago

Of course it will be tricky to figure out precisely what those callbacks should be

Agreed, I think this will require some careful thought, as every callback/"hook" that we support is a callback that could potentially prevent us from adding certain features in the future.

I’m a little less convinced about being able to compose the policy objects

While I don't think strategy composition is essential (I care more about some of the other points already discussed), I do find it really nice to have, and I'm not sure if there's really any great alternatives.

Your example omits the nextToRemove callback, which is crucial for automated cache eviction in response to memory pressure

The CacheMap class I'm envisioning would be a little dumber than what you proposed. It won't have constructor parameters like maxSize, instead, you would supply the maxSize parameter to the strategy's constructor. Strategies like an LRU strategy or a weighted-cache strategy would care about the max-size, while strategies such as a timeout, or "eject when this event happens" wouldn't care.

I put together a little gist with an example, minimal implementation of the CacheMap class (it's not feature-complete, but it does show of the basic concept). Included are two caching strategies - LRUCacheStrategy and WeightCacheStrategy. That gist also shows how you can get nextToRemove()-like behavior.

It's true that not all strategies would be composable like you fear, and perhaps that's a ding against this composable idea. But, hopefully it's also a bit intuitive when things don't go together. For example, if we tried to compose the two example strategies from that gist, we'd end up with something that looks like this:

const cache = new CacheMap([
  new WeightCacheStrategy({ maxSize: 2, getWeight: (...) => ... }),
  new LRUCacheStrategy({ maxSize: 2 })
]);

From that code snippet, it should be clear that both WeightCacheStrategy and LRUCacheStrategy are attempting to enforce a max-size, and that's nonsense. The strategy callbacks will be executed in order, so the first ones in the list will take precedence over later ones, meaning the WeightCacheStrategy will notice that there are too many items in the map, so it will pick an item to chuck, and chuck it. After this, the LRUCacheStrategy's callback will run. When it does, it will find that the map now contains the correct number of items, so it won't do anything. This is, at least, the current behavior in the gist, but I'm sure it'll change as we make decisions on when/how the strategy callbacks get called.

So, in short, yes, you're right, some strategies can not be combined together. It's nonsense to even try, and you'll get nonsense behavior if you do try. This is unfortunate. However, despite this issue, I still feel like strategy composition is worth it, because, put simply, how else would you do it? I've written real code that would eject entries from a memoization cache when any of the following occurred:

It would be a shame if I had to hand-code a caching strategy capable of handling all three of these points when two of the three are going to be built-in strategies anyways. Plus, it's technically possible to write a strategy-composing function in userland anyways that would take multiple strategies and output a single composite strategy, so perhaps I would end up simply doing that instead of re-invent built-in functionality. (I guess, this also means that strategy composition could be a follow-up proposal, where we provide a built-in utility function that will compose multiple strategies and produce a single one).

Update:

I'm realizing I didn't fully address the functionality from #7, which includes the ability to allow cache entries to be auto-purged if there's, for example, lots of memory usage. I believe there's a handful of different ways to solve this, like allowing the CacheMap to receive other strategy fields that customize how it reacts to low memory.

One simple option would be to implement a global event emmitter that gets triggered when a low-memory threshold is crossed. What that means can differ from platform to platform, but it's a way for the system to ask the program to use less resources.

Here's an updated example of WeightCacheStrategy from the gist, where it is using this new global event to control memory usage.

class WeightCacheStrategy {
  #maxSize;
  #getWeight;
  #weightMap = new Map();
  constructor({ maxSize, getWeight, purgeable }) {
    this.#maxSize = maxSize;
    this.#getWeight = getWeight;

    if (purgable) {
      globalThis.lowMemoryThreshold.addEventListener('crossed', () => {
        this.#maxSize = Math.floor(maxSize * 2/3);
        ...perhaps clear out some old content right now...
      });
      globalThis.lowMemoryThreshold.addEventListener('uncrossed', () => {
        this.#maxSize = maxSize;
      });
      // Yes... there's a memory leak here because I didn't bother to unsubscribe to these event listeners.
    }
  }

  onEntryAdded({ map, key, value }) {
    ...same as in the gist...
  }

  #nextToRemove(map) {
    ...same as in the gist...
  }
}

We could alternatively find a way to allow a strategy to register a "maxSize" value and a "nextToRemove()" function with the CacheMap, which would be closer to the original proposed idea from #7 of how the memory-management could be done.

theScottyJam commented 2 years ago

Hmm, I will admit that this:

const cache = new CacheMap([
  new LRUCacheStrategy({ maxSize: 2 })
]);

is certainly much uglier and harder to remember than these options:

const cache = new LRUCache({ maxSize: 2 });
const cache = new CacheMap('lru', { maxSize: 2 });

And this issue will exist, whether or not strategies are composable.

Perhaps it's worth additionally adding a static factory methods to the CacheMap class, to provide convenient ways to build cache instances using one of the built-in strategies. Like this:

const cache = CacheMap.lru({ maxSize: 2 });