timkpaine / temporal-cache

Time-based cache invalidation
Apache License 2.0
7 stars 4 forks source link

persistent_lru_cache generates a new key for the same arguments between runtimes when there's more than 1 argument #28

Open TimChild opened 2 years ago

TimChild commented 2 years ago

Describe the bug The cache persistence only seems to work correctly when there is a single argument in the function call. The cached result is not retrieved for multiple arguments or any keyword arguments in subsequent runtimes (although it does work in the same runtime).

To Reproduce Make a scatch.py:

    from temporalcache.persistent_lru_cache import persistent_lru_cache

    @persistent_lru_cache('test.cache')
    def foo(*args, **kwargs):
        print(f'running {args, kwargs}')
        return f'{args, kwargs}

Run scratch.py twice.

Expected behavior The first time scratch.py is run the expected behavior is seen:

running (('1',), {})
(('1',), {})
(('1',), {})
running (('1', '2'), {})
(('1', '2'), {})
(('1', '2'), {})
running ((), {'kw': 'k'})
((), {'kw': 'k'})
((), {'kw': 'k'})

The second time, I would expect to see:

(('1',), {})
(('1',), {})
(('1', '2'), {})
(('1', '2'), {})
((), {'kw': 'k'})
((), {'kw': 'k'})

Instead, the second (and future runs) result in:

(('1',), {})
(('1',), {})
running (('1', '2'), {})
(('1', '2'), {})
(('1', '2'), {})
running ((), {'kw': 'k'})
((), {'kw': 'k'})
((), {'kw': 'k'})

Desktop (please complete the following information):

Additional context I believe the issue is that for multiple arguments or keyword arguments (where a kwd_mark object is added as an additional argument) the key is generated from _HashedSeq(key) instead of just key[0]. The hash of _HashedSeq uses hash(tup), but the python built-in hash is only deterministic in the same runtime, not between runtimes.

I think the fix probably requires the use of something like hashlib.sha256(), but then that only takes bytes, and I'm not sure what the implications of that may be. Maybe some combination with json.dumps()?

TimChild commented 2 years ago

Using this hash function instead seems to fix the issue for me, but I have no idea what other problems I'll run into with this yet...

import hashlib

def deterministic_hash(*args) -> int:
    string = ''.join([str(arg) for arg in args])
    return int(hashlib.sha1(string.encode('ascii')).hexdigest(), 16)
timkpaine commented 2 years ago

https://github.com/abarnert/persistent-lru-cache/issues/3

TimChild commented 2 years ago

Ah, thank you! Sorry, I didn't realize I was opening a duplicate issue. I'm a bit new to opening issues etc.

I actually ended up here from using your pyEX package 👍