tkem / cachetools

Extensible memoizing collections and decorators
MIT License
2.36k stars 163 forks source link

Using cachedmethod with key=partial(hashkey, 'pep') is wrong #326

Open NoamNol opened 3 weeks ago

NoamNol commented 3 weeks ago

In the docs there is this example with key=partial(hashkey, 'pep'):

class CachedReferences:

    def __init__(self, cachesize):
        self.cache = LRUCache(maxsize=cachesize)

    @cachedmethod(lambda self: self.cache, key=partial(hashkey, 'pep'))
    def get_pep(self, num):
        """Retrieve text of a Python Enhancement Proposal"""
        url = 'http://www.python.org/dev/peps/pep-%04d/' % num
        with urllib.request.urlopen(url) as s:
            return s.read()

    @cachedmethod(lambda self: self.cache, key=partial(hashkey, 'rfc'))
    def get_rfc(self, num):
        """Retrieve text of an IETF Request for Comments"""
        url = 'https://tools.ietf.org/rfc/rfc%d.txt' % num
        with urllib.request.urlopen(url) as s:
            return s.read()

I think it's wrong to use partial(hashkey, 'pep') with cachedmethod, because the self argument is passed to the key function.
Usually self is ignored by the default methodkey function, but with partial(hashkey, 'pep') it's not ignored.

Also we can't replace it with key=partial(methodkey, 'pep') because 'pep' will be the ignored argument instead of self.

tkem commented 3 weeks ago

Usually self is ignored by the default methodkey function, but with partial(hashkey, 'pep') it's not ignored.

You're right, with hashkey the self argument is not ignored, leading to an unnecessary key item:

>>> print(list(docs.cache.keys()))
[('pep', <__main__.CachedReferences object at 0x7f5f2bbcead0>, 1), ('rfc', <__main__.CachedReferences object at 0x7f5f2bbcead0>, 1)]

However, AFAICS the behavior is as intended and not generally wrong, just a little less efficient than it could be.

NoamNol commented 2 weeks ago

@tkem thank you for the reply.

The main problem is the inconsistency between this and the first example in the docs where self is ignored, leading to unexpected results when my class has implemented a strict __hash__ method that returns a different hash value for every minor change in the class fields.

When using @cachedmethod(lambda self: self.cache, key=partial(hashkey, 'pep')) I didn't expect the cache key to be related to the class’s strict hash value, I only wanted the cache key to be calculated based on the parameters of the method and the name of the method.

So when the inconsistency is the first problem, the second problem is the lack of an example of how to use cachedmethod on multiple methods in the class ignoring self

tkem commented 2 weeks ago

@NoamNol Thanks for your response! Agreed, I just wanted to make sure I was not overlooking anything - if you agree that the example's behavior is basically correct, i.e. sharing a cache between methods leads to the expected results, I'd like to change this issue from "bug" to "enhancement" and think about how to make the example more consistent by using cachedmethod for the next regular release. Suggestions are welcome, of course ;-)

tkem commented 2 weeks ago

One simple solution would be to use a keyword argument to differentiate, i.e.

    @cachedmethod(lambda self: self.cache, key=partial(methodkey, type='pep'))
    def get_pep(self, num):
       ...
    @cachedmethod(lambda self: self.cache, key=partial(methodkey, type='rfc'))
    def get_rfc(self, num):
       ...       

This would lead to the following keys, where the _HashedTuple is used to separate keyword args in the resulting key tuple, which is an implementation detail we probably needn't discuss:

[(1, <class 'cachetools.keys._HashedTuple'>, 'type', 'pep'), (1, <class 'cachetools.keys._HashedTuple'>, 'type', 'rfc')]