isaacs / node-lru-cache

A fast cache that automatically deletes the least recently used items
http://isaacs.github.io/node-lru-cache/
ISC License
5.35k stars 353 forks source link

`max` option is not strictly followed, throwing `RangeError: Map maximum size exceeded` when adding lots of items #331

Closed faheel closed 6 months ago

faheel commented 7 months ago

Even after specifying the max option, the cache keeps adding more items to the internal map, and (most likely) when the V8 map size limit of 2^24 is exceeded the following error is thrown:

RangeError: Map maximum size exceeded
    at Map.set (<anonymous>)
    at LRUCache.set (/myworkdir/node_modules/lru-cache/dist/commonjs/index.js:862:26)
    <snip>

Sample script

const { LRUCache } = require('lru-cache');

const elements = 10_000_000;
const iterations = 5;

const cache = new LRUCache({
  max: elements,
});

function getRandomCacheKey() {
  return Math.random().toString(36).substring(2, 8);  // simulate low hit-rate caching
}

function getRandomFloatValue() {
  return Math.random() * 100000;
}

let cacheHits = 0;
for (i = 1; i <= iterations; i++) {
  for (e = 1; e <= elements; e++) {
    try {
      cache.set(getRandomCacheKey(), getRandomFloatValue());
    } catch (error) {
      console.error(error);
      console.log('Iteration', i, '\tElement', e, '\tCache hits', cacheHits);
      process.exit(1);
    }
    if (cache.get(getRandomCacheKey())) {
      cacheHits++;
    }
  }
}
touhidurrr commented 7 months ago

In my case, I am observing some kind of memory leak. See: https://github.com/isaacs/node-lru-cache/issues/333#issuecomment-2057922457

I am using both max and maxSize. But not sure if any of them are working.

I am currently observing the behavior of lru-cache. I will update if I find anything new to share.

touhidurrr commented 7 months ago

Update so far:

image

The number in the average size text is taken from cache.size. The number in the last line memory spike / calculatedSize keeps increasing. It increased from ~3 to what it is now. So, in the end, I am unable to verify if the memory leak I am facing is caused by lru-cache or not.

[!Note] I know that average size keeps increasing. It is sort of expected behavior for my app. So, it does not verify anything.

touhidurrr commented 7 months ago

[!NOTE] I am currently switching to node-cache to see if this fixes the memory leak issue.

touhidurrr commented 7 months ago

Update 2: facing same problems with node-cache. Probably not an issue related to lru-cache. This might be a memory leak issue related to Bun instead.

isaacs commented 7 months ago

I can reproduce the map overflow error with the script provided, thank you. I'll take a look at it shortly, looks like the key isn't getting deleted properly when it should be.

isaacs commented 7 months ago

Until that is fixed, since I can repro on node, I don't think this is a bun issue. There might also be a bun issue, of course, but lru-cache is not off the hook.

isaacs commented 7 months ago

The weird thing here is that it's hitting the max Map size range error at 9_999_999 items, very consistently.

Bisecting into this, it seems like the tipping point is 2**23 + 2.

But a JavaScript Map object should be able to store up to 2**24 items, or 16_777_216. So I'm not sure why it's failing at this value.

> let i = 0, m = new Map(); try { for (; true;i++) m.set(String(Math.random()),String(Math.random())) } catch { console.log(m.size, i) }
16777216 16777216

I tried to see if maybe the frequency of collisions is making the Map take up more space and freak V8 out sooner than it should, but couldn't reproduce that, either:

> let m = new Map, i = 0; try { for (; true; i++) m.set(Math.random().toString(36).substring(2, 8), i) } catch (e) { console.log(e.message, i, m.size) }
Map maximum size exceeded 16842420 16777216

From what I can tell, it is correctly deleting things from the keyMap as well as from the key and value lists. The keyMap is not expanding when there are cache hits, it's just overwriting, and the cache.size and keyMap.size are always in sync.

What I can't figure out is, why isn't this map going up to 2**24 entries, and instead crashing at around half that value? I can't get it to do that in any other context, except the Map that's within this LRUCache object.

Very curious.

isaacs commented 7 months ago

What's also even weirder is that it always crashes at 1 less than whatever the max is, as long as it's at least as large as 2**23 + 2. So, when the max is 10_000_000, it'll crash at 9_999_999. If the max is 9_000_000, then it'll crash at 8_999_999. I have no idea why that would make any difference at all, unless it somehow has to do with deleting from a very large Map object?

isaacs commented 7 months ago

Confirmed, this is a v8 bug. https://issues.chromium.org/issues/335269877

In the meantime... idk. Can you refactor to have a cache with a max lower than 2**23 + 2? I'm not sure how to even go about working around this, tbh. Would have to think about it. Maybe there's some data structure we could use to keep the keyMap below that limit, but I'm not sure how to do that without impacting performance.

I can't find anything in this library that's incorrect, per se. I mean, deleting items from a Map object should prevent it from raising the RangeError: Map maximum size exceeded, because it keeps the size below the Map maximum size.

If you're observing a memory leak, then that might be worth investigating. It could be that V8 is leaking memory in some way, leading to this error being raised, but I don't see any evidence of that. I suspect that you seeing the issue in other caching libraries is because it's a pretty common approach to use a Map somewhere in the implementation of a cache (or even, just using a Map as the cache), so if you're caching 10M things, there's a good chance that you're going to run into this no matter what cache implementation you use.

At this point, it seems like the best course of action is just to not allow the cache to be bigger than 2**23+2, but that seems pretty restrictive.

touhidurrr commented 7 months ago

Until that is fixed, since I can repro on node, I don't think this is a bun issue. There might also be a bun issue, of course, but lru-cache is not off the hook.

Confirmed, this is a v8 bug. https://issues.chromium.org/issues/335269877

But bun runs on js core. Why am I facing memory leaks I wonder. Bun has a lot of open memory leak issues. Gotta be related to those.

It would be interesting to try reproduce this bug on bun and firefox. Then we can be sure It's not lru-cache.

isaacs commented 7 months ago

@touhidurrr Indeed. I can't reproduce the spurious Map RangeError on Safari, but I haven't tried on Bun.

If you're seeing a memory leak, or the same RangeError crash behavior as Node, it could be that Bun is using a different jscore version than Safari or Safari Tech Preview on my machine, or it could just be something unrelated.

touhidurrr commented 7 months ago

@isaacs, trying to repro this issue in bun. In node its crashes as described here. However, in bun, it gets stuck for eternity after the first iteration. Did your script finish in Safari? Can't repro that because I don't have a mac machine.

isaacs commented 7 months ago

@touhidurrr This script zooms in on the specific Map bug that's causing the test script to crash on node or Chrome:

const m = new Map()
const MAX = 16_700_000

for (let i = 0; i < MAX; i++) {
  m.set(i, i)
}

for (let i = 0; i < MAX; i++) {
  for (const k of m.keys()) {
    m.delete(k)
    break
  }
  try {
    m.set(i, i)
  } catch (e) {
    console.log(e.message, m.size, i, m.size + i, 2**24)
    break
  }
}

When I run the repro script from this issue in bun, yes, I see the same thing: it runs one iteration just fine, then slows to a crawl about halfway through iteration 2. This seems like an unrelated performance bug with bun (or, likely, jscore).

isaacs commented 7 months ago

Oh, man, yeah, bun running this thing is using all the CPU and all the memory.

What's also interesting is that iteration 1 on bun is pretty consistently about 25% - 50% faster than node, but then it just borks.

touhidurrr commented 7 months ago

What's also interesting is that iteration 1 on bun is pretty consistently about 25% - 50% faster than node, but then it just borks.

In windows it is consistently a little slower than node. Like 30 to 50% actually.

@touhidurrr This script zooms in on the specific Map bug that's causing the test script to crash on node or Chrome:

Anyways, can this be reproduced in bun also?

touhidurrr commented 7 months ago

Anyways, can this be reproduced in bun also?

Ok it seems to work. nvm.

touhidurrr commented 7 months ago

So, with this (https://github.com/isaacs/node-lru-cache/issues/331#issuecomment-2060334123), we can be sure that the memory leaks I am facing is a bun specific issue at last. Regardless of whatever else the output might be in node.

Thanks for your time!

isaacs commented 7 months ago

One possible way to work around this safely would be to add sharding to the key->index map, and require a options.shardMethod function if you want to use more than 2**23 entries, which would take a key and return a number >= 0 and < options.shards max, and this would determine which keyMap a key's index would be found in.

Kind of annoying, though, and it adds a few extra method calls in an extremely hot path, which I don't love. Someone could end up stuffing the keys all in the same map anyway, so now they have to be careful to balance it, too.

But something in that direction, I think.

touhidurrr commented 7 months ago

One possible way to work around this safely

Maybe you should wait unless there are updates in the bun issue? This might gives further insight on whether this issue is related to lru-cache at all. However, it is problematic that this issue is appearing on the latest version of Node 20 LTS, and even if v8 patches this in their latest builds I am not sure if it would be reflected on Node 20 or prior LTS version. So, this might need a fix anyways.

isaacs commented 7 months ago

In windows it is consistently a little slower than node. Like 30 to 50% actually.

Not surprising. Bun has (as of writing this, in spring of 2024) built-in windows support now, but given how much work was poured into making it go fast on posix systems, and how different Windows' performance characteristics are, I mean, that's just node having a head start of well over a decade. I'm guessing eventually bun will catch up though, no doubt.

So, with this (https://github.com/isaacs/node-lru-cache/issues/331#issuecomment-2060334123), we can be sure that the memory leaks I am facing is a bun specific issue at last.

This bug is a crazy good find, actually. Exposing completely different issues in V8 and jscore (or maybe just bun somehow), with one reproduction case. Not bad!

touhidurrr commented 6 months ago

Just thought I would share this here (I am restarting my app using cron every two hours for now for memory leak issues): app metrics

isaacs commented 6 months ago

Closing this issue, because there's no evidence that lru-cache has a bug, and the spurious throw is a bug in V8 and isn't really worth working around.