seznam / flexp

BSD 3-Clause "New" or "Revised" License
8 stars 3 forks source link

Caching is too sensitive. #5

Closed alaneckhardt closed 6 years ago

alaneckhardt commented 6 years ago

PickleCache takes into account all variable in the chain members. It would be good to be able to specify explicitly which members it should ignore.

class MyWorker:
    PickleCacheIgnore = ['workers']

    def __init__(self, num_examples=100, workers=10):
        self.num_examples = num_examples
        self.workers = workers

Second thing is to be able to deal with these cases:

my_chain = Chain([
    PickleCache("cached_pkl", "id", my_chain_1),
    PickleCache("cached_pkl", "id", my_chain_2),
    PickleCache("cached_pkl", "id", my_chain_3),
])

if there is data cached for my_chain_3, we should be able to skip loading the pickles for my_chain_1 and my_chain_2. But only in the case, when cached my_chain_3 came from the same settings of my_chain_1 and my_chain_2.

One approach would be to update data['id'] after each PickleCache or even after each module so that the data['id'] would contain all the provenance of the data transformations it went through.

What do you think?