scipp / sciline

Build scientific pipelines for your data
https://scipp.github.io/sciline/
BSD 3-Clause "New" or "Revised" License
10 stars 2 forks source link

Caching (docs / examples / tests) #30

Open SimonHeybrock opened 1 year ago

SimonHeybrock commented 1 year ago

Caching of intermediate results is probably out of scope for Sciline. However, it could be useful to provider helpers (such as decorators) that a user can use for caching objects that may get reused across multiple compute() calls. For example, downloading a big file, loading a big file, ...

By making this an explicit wrapper instead of trying to implement a complex and hard to control internal mechanism we:

An alternative would be to recommend computing the intermediate result directly, and providing this as an instance-provider to Pipeline. One important requirement (for either solution) would be that it can be turned on or off with ease.

SimonHeybrock commented 1 year ago

Note that for 0-ary functions a user can simply use, e.g., functools.lru_cache.

For unary (or higher functions), lru_cache will prevent the the repeated call to the function, but not to its dependencies. That is, this may still be useful for, e.g., a unary function that takes a filename as input, but not for avoiding computation of an entire expensive branch of the task tree.

For now, I would suggest to: