isaacs / node-lru-cache

A fast cache that automatically deletes the least recently used items
http://isaacs.github.io/node-lru-cache/
ISC License
5.38k stars 353 forks source link

Add a getOrSet() convenience method to delegate memoization / lazy initialization to the cache #335

Closed 0x24a537r9 closed 5 months ago

0x24a537r9 commented 7 months ago

When using this package for memoization / lazy initialization, there's a pretty common pattern for getting while also setting only if a value is not already set:

function getValue(key) {
  const cachedValue = cache.get(key);
  if (cachedValue !== undefined) return cachedValue; // Should this actually be using .has(key)? That would then require another .get() call if so...

  const newValue = someExpensiveFunction();
  cache.set(key, newValue);
  return newValue;
}

It seems to me that it's common enough (and subtle enough when it comes to checking null vs. undefined vs. using .has()) that it would be worthwhile to define a function like getOrSet:

function getValue(key) {
  return cache.getOrSet(key, () => expensiveFunction());
}

Apart from the convenience factor, this would put the burden of correctness when it comes to existence checking (which I suspect might actually incorrect above in the case of cached values of undefined) on this library's maintainers (who are much more qualified than the average user of this this package to know the safest behavior), and also might open an opportunity for slight optimization if the correct behavior is .has(), since without access to the internal state of the cache one would have to also add a .get() after the .has(), adding more function invocations and another lookup.

If this were deemed something worthwhile to add, the question is whether to have it be an independent function or an option in the options for get. E.g. it could be a new getOrSet or getWithDefault or memoize method, or a getDefault: () => T option for get. I think a new method avoids a) potentially dangerously unexpected/misunderstood behavior where it ends up performing a write operation when .get() sounds like it would only be a read operation, and B) it being buried in options documentation, leading to less use.

isaacs commented 5 months ago

So, this is pretty much exactly what cache.fetch() does. You provide a fetchMethod in the options, and it resolves to the value. If you try to fetch() a key that's in the cache, or currently being fetched, then it won't call the fetchMethod a second time.

But! that's only for async expensive functions. Usually expensive functions are expensive because they do IO, and it's usually best to make anything expensive async if you can, because it means you're not blocking the event loop. But, I can see how that leaves out the possibility of doing a similar thing with sync operations.

I don't like having it be an option to get(), because as you say, that shouldn't do set() operations. And it feels a bit weird to have a "default" option, when you can already just do const value = cache.get(key) ?? getDefaultValue(key) yourself pretty easily.

(For what it's worth, your getValue method above is perfectly fine, it's just inflexible. For example, you might want to set a TTL based on an http response header, or only return stale if some condition is set, who knows.)

So, API options (these don't have to be exclusive, but it is maybe a little bit extraneous to have both).

Option 1: memoMethod

Cons: feels kinda heavy, but very flexible and powerful. Pros: Can basically do anything you'd need, work out of the box more or less by default, but still infinitely configurable.

Option 2: function wrapper

Cons: Feels somewhat divorced from the cache object. Easy enough to just do yourself with Option 1. Pros: A bit more intuitive in a way, doesn't require putting anything on the constructor options.

I think I'll take a stab at Option 1, and then maybe that's enough. So then your code above becomes:

const cache = new LRUCache({
  memoMethod: (key, oldValue, { options, context }) => {
    // can set any cache.set() options on options, but since the get is already done, there's no way to change that,
    // but you can at least look at the object to see what options where used in the get.
    // context is whatever you pass in as `cache.memo(key, { context })`, if that matters to you.
    return someExpensiveFunction()
  }
})
function getValue(key) {
  return cache.memo(key)
}
0x24a537r9 commented 4 months ago

Thanks! Certainly wasn't expecting such a quick turnaround!