nteract / scrapbook

A library for recording and reading data in notebooks.
https://nteract-scrapbook.readthedocs.io
BSD 3-Clause "New" or "Revised" License
281 stars 26 forks source link

Capture lineage / sourcing of data so that repeated calculations can be avoided #21

Open MSeal opened 5 years ago

MSeal commented 5 years ago

Capturing the source sha or other requirements to recompute or read scaps when calculating data would be helpful.

betatim commented 5 years ago

Could you explain a bit what use case you have in mind? Something like telling the user the scrap they just retrieved from a notebook needs recomputing?

For caching of results during computations we should checkout https://joblib.readthedocs.io/en/latest/memory.html which is well used and maintained by someone else (yay!).

MSeal commented 5 years ago

So the core intention here would be to allow for the glue action against a particular ref to not push any data if the contents were identical. I don't think it's necessary at first, but having a path for success when a user wants to prevent expensive computation / pushes might be helpful. Another pattern may be to provide additional wrapping that allows the user to compute_and_glue data that will glue a reference without compute if the source data is considered equivalent by some registered function.