sindresorhus / p-memoize

Memoize promise-returning & async functions
MIT License
396 stars 30 forks source link

Fix the memory leak in `promiseCache` #47

Closed mdawar closed 2 years ago

mdawar commented 2 years ago

This is a fix for the issue discussed in #31, currently the promiseCache keeps getting larger, the keys are never removed unless pMemoizeClear is explicitly used.

In this fix, we delete the key from promiseCache when the promise is settled (Resolved or rejected).

Also I would like your opinion about promiseCacheStore if it's still needed after this change? I guess we can safely remove it.

Also if this gets merged we can solve concurrent invocations #43 by first checking the promiseCache and returning the promise before checking cache, for example:

if (promiseCache.has(key)) {
  return promiseCache.get(key)!;
}

if (cache.has(key)) {
  return cache.get(key) as AsyncReturnType<FunctionToMemoize>;
}
sindresorhus commented 2 years ago

// @Richienb

mdawar commented 2 years ago

@Richienb I thought that I haven't broken a anything especially that the tests have passed.

Anyway, I don't really understand what do you mean by Values stored in cache are expected to be serialised while values resolved from promises stored in promiseCache are not.

Isn't the promiseCache a temporary cache to prevent multiple invocations while waiting for the promise to resolve? After this change, multiple invocations of the memoized function return the same promise (of the original function) which is the expected behavior, and when the promise resolves then it returns the cached result found in cache, so in this case why return the original promise from promiseCache? The memoized function is also async so it resolves to the cached value.

And about the promise rejections, I'll add some tests for the promise rejections as they're not covered.

But first if you add this test and try it with both cachePromiseRejection: false and cachePromiseRejection: true, both tests succeed, but with cachePromiseRejection: true it must fail the second call of memoized must also reject with an error right? Isn't this the expected behavior when using cachePromiseRejection: true?

test('do not cache promise rejections', async t => {
    let index = 0;
    let throwNext = false;
    const errorMessage = 'Test error';

    const fixture = async () => {
        throwNext = !throwNext;

        if (throwNext) {
            throw new Error(errorMessage);
        }

        return index++;
    };

    const memoized = pMemoize(fixture, {
        cachePromiseRejection: false,
    });

    const error = await t.throwsAsync(memoized);
    t.is(error.message, errorMessage);

    t.is(await memoized(), 0);
});
Richienb commented 2 years ago

Anyway, I don't really understand what do you mean by Values stored in cache are expected to be serialised while values resolved from promises stored in promiseCache are not.

The original vision was for the cache to be a database which by nature, can only store serialised values. https://github.com/sindresorhus/p-memoize/issues/3

The problem with caching promise rejections is that this only happens locally which means other functions accessing the database might not experience the error or might experience the error again. This makes this option less useful and should be up to the user to handle themselves.

mdawar commented 2 years ago

OK but it seems that after #32 cachePromiseRejection: true does not work as expected, as you can see from the above test it's useless now.

I have tested it with the code previous to #32:

if (promiseCache.has(key)) {
  return promiseCache.get(key)!;
}

if (await cache.has(key)) {
  return (await cache.get(key))!;
}

And it worked as intended, so it seems it's already broken.

The problem with caching promise rejections is that this only happens locally which means other functions accessing the database might not experience the error or might experience the error again. This makes this option less useful and should be up to the user to handle themselves.

OK but isn't this the expected behavior if the user wants to cache rejected promises? The memoized function will keep throwing the error, just like caching the resolved value. I mean the whole purpose of memoizing is caching the value and never running the original function unless the cache expires, and since a user configures cachePromiseRejection: true then it's expected to keep getting the same error until expiration, which is something up to the user to handle like you said.

So what do you suggest?

Richienb commented 2 years ago

The problem is that after the rewrite, the naming of cachePromiseRejection is no longer accurate. It should be something like locallyCachePromiseRejection. I'm not sure of any case where this behaviour is desired. Do you think it should still be included in the library or has it become an edge case?

And yes, https://github.com/sindresorhus/p-memoize/pull/32 looks to be a mistake on my part.

mdawar commented 2 years ago

Actually we should not merge this before fixing this promise rejection issue.

We can keep cachePromiseRejection as it is right now, but in this case we should cache the error in the cache and before returning a cached value we have to check if it's an error and throw it, this should provide the same experience as if the original function has rejected if the user wants to cache rejections.

I can start working on all of these things and you can review it when finished, what do you say?

Richienb commented 2 years ago

but in this case we should cache the error in the cache

When we assume that databases are used, it means the library would need to somehow serialize errors. This loses information and unlike return values, there isn't a semantic way to throw errors that can be serialized (apart from throw {message: 'Catch me if you can!'}). Also, we'd need to come up with our own data format for storing these errors with the values so it seems to end up not being worth it. The user could do this themselves with p-reflect and have much more control.

mdawar commented 2 years ago

but in this case we should cache the error in the cache

When we assume that databases are used, it means the library would need to somehow serialize errors. This loses information and unlike return values, there isn't a semantic way to throw errors that can be serialized (apart from throw {message: 'Catch me if you can!'}). Also, we'd need to come up with our own data format for storing these errors with the values so it seems to end up not being worth it. The user could do this themselves with p-reflect and have much more control.

OK now I see where you need the serialization, you mean when using a custom cache option that's a database right? I didn't know about this option, I thought we're only caching in memory.

But shouldn't the memoization library only handle memoization only? I mean the user should handle serialization if they're using a database.

I guess you can keep cachePromiseRejection but it will only support caching in memory, for example the default new Map(). If you want to cache errors locally (in the database cache example) then what about expiration? Consider a cache that expires, how this is going to be synced to this local cache, and this adds a lot more complexity.

So what do you suggest?

Richienb commented 2 years ago

I mean the user should handle serialization if they're using a database.

Exactly. Error handling should be left to userland.

mdawar commented 2 years ago

I mean the user should handle serialization if they're using a database.

Exactly. Error handling should be left to userland.

So should we keep cachePromiseRejection without worrying about the serialization part?

This way this options is left there for people that want to memoize the function's result (success or failure), for example if I'm using QuickLRU cache and even if I cache the promise rejection (for example from a failed API call) it will be retried later on expiration, and I have to accept that because I chose to cache rejections.

Richienb commented 2 years ago

If both caches are local then that could work. Though, we'd need to rename the option.

mdawar commented 2 years ago

If both caches are local then that could work. Though, we'd need to rename the option.

Currently both are local (cache and promiseCache), the library can't know that the user is using a remote cache which might be a database.

So cache should store the end result and promiseCache should only be used to return the same promise and prevent concurrent invocations (before the promise is settled).

So do you want me to continue working on this? Or should we close this pull request.

mmkal commented 2 years ago

I don’t know all that much about the implementation of this library, but I agree that persistence seems like a separate concern from memoization. What if this library made the simplifying assumption that all memoization was in memory, and emitted an event when a value was cached, so a user can listen for that event and write to a db if they want? It could also emit error events if people want to handle that in custom ways.

Richienb commented 2 years ago

and emitted an event when a value was cached, so a user can listen for that event and write to a db if they want

That is, by definition, the set method of cache.

It could also emit error events if people want to handle that in custom ways.

These errors events are basically a try-catch block around in a helper function.

Your ideas seem to already be implemented.

mmkal commented 2 years ago

It's true the user could emit an event in the set method, my point was more that by explicitly supporting that pattern you'd no longer need to make the cache database-friendly, which seems like the root of this problem. It'd only be responsible for... memoizing.