joblib / loky

Robust and reusable Executor for joblib
http://loky.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
526 stars 46 forks source link

How to reuse a cache #269

Open basnijholt opened 4 years ago

basnijholt commented 4 years ago

When using memoization (not with functools.lru_cache because https://github.com/joblib/loky/issues/268) I am unable to get loky to use the cache.

I guess this is because ex.submit(f, ...) repickles f each time. Is it possible to tell loky to not do that?

In this example below, I show that a concurrent.futures.ProcessPoolExecutor uses the cache, while loky doesn't do this.

from concurrent.futures import ProcessPoolExecutor
import time
import loky

def memoize(f):
    memo = {}

    def helper(x):
        if x not in memo:
            memo[x] = f(x)
        return memo[x]

    return helper

@memoize
def g(x):
    time.sleep(5)

def f(x):
    g(1)
    return x

with loky.reusable_executor.get_reusable_executor(max_workers=1) as ex:
    t = time.time()
    ex.submit(f, 10).result()
    print(time.time() - t)
    t = time.time()
    ex.submit(f, 10).result()
    print(time.time() - t)

# prints
# 5.490137338638306
# 5.018247604370117 <---- cache isn't reused

with ProcessPoolExecutor(max_workers=1) as ex:
    t = time.time()
    (ex.submit(f, 10).result())
    print(time.time() - t)
    t = time.time()
    (ex.submit(f, 10).result())
    print(time.time() - t)

# prints
# 5.012995958328247
# 0.002056598663330078 <---- used the cache (because it forked the process and doesn't need to repickle)
ogrisel commented 3 years ago

Instead of using a local dict to store the cache entries you should use a module attribute. module attributes (apart from those defined in the __main__ module) are pickled by reference instead of by value, so that should work. Each worker process would have it's own cache.

ogrisel commented 3 years ago

This issue made me think about improving the cloudpickle pull request: https://github.com/cloudpipe/cloudpickle/pull/309#issuecomment-698562884 . It might be possible to implement re-usable lru_cache for interactively defined functions but this is not trivial work.

basnijholt commented 3 years ago

It would be great to make lru_cache work.

For now, I have fixed it by making a cache that is shared in memory: docs, source.