iodide-project / iodide

Literate scientific computing and communication for the web
Mozilla Public License 2.0
1.49k stars 142 forks source link

proposal: `iodide.file.cache`? (name tbd) #1838

Open bcolloran opened 5 years ago

bcolloran commented 5 years ago

(context: discussion of "query" language plugin)

for workflows in which you need to run expensive computations or queries that you don't want to re-run when sharing a report, it is important that we have mechanisms for (a) not evaluating some code chunks, and (b) for saving and loading results of these expensive computations. currently, we have the ability to explicitly mark a chunk with skipRunAll to accomplish (a), and the iodide.file.save and iodide.file.load APIs to accomplish (b). In practice, these are used as follows:

%% {js, py, query, whatever} skipRunAll
// expensive computation, data munging, query that you
// don't want to re-run every time you load a report...
%% js skipRunAll
// runs when evaluated manually (during initial exploration)
// but does not run during run-all (e.g. when loading report view)
iodide.file.save("result.json", "json", result)
%% js
// does run when loading report view
iodide.file.load("result.json", "json", "result")

We could remove some of this verbosity (at the cost of being a little bit less explicit) by introducing a new function iodide.file.cache (or "cached" or "loadOrSave" or ????) with the same signature

iodide.file.cache(fileIdentifierString, fileTypeStringFlag, varNameString) such that:

then the above code example would shorten to:

%% {js, py, query, whatever} skipRunAll
// expensive computation, data munging, query that you
// don't want to re-run every time you load a report...
%% js 
iodide.file.cache("result.json", "json", "result")

Any holes in this idea? @hamilton obviously you're closest to the implementation we have so far-- would this be feasible? any corner cases missing?

@rafrombrc @wlach @mdboom @teonbrooks -- general UX and naming ideas welcome.

mdboom commented 5 years ago

The result variable probably needs to be the variable name as a string, since you can't replace its content in place otherwise. That's not bad but a little confusing, potentially. Alternative would be:

result = iodide.file.cache("result.json", "json", result)

which is also confusing.

bcolloran commented 5 years ago

ah, yes of course you're right. indeed, this is already how the file load API works. i'll update the description above.

(le sigh, no metaprogramming in js :-/ )

bcolloran commented 5 years ago

side note: it also possible to accomplish the current workflow using a fetch chunk:

%% {js, py, query, whatever} skipRunAll
// expensive computation, data munging, query 
%% js skipRunAll
iodide.file.save("result.json", "json", result)
%% fetch
json: result = files/result.json
bcolloran commented 5 years ago

oh @mdboom i just remembered (and @hamilton can correct me if i recall it wrongly) -- a cache function probably could not have syntax like result = iodide.file.cache("result.json", "json", result), for the same reason the file.load function needs to take the variable name argument as a string: cache would be an async function, so it would return a promise rather than the final desired value. And of course we can't do e.g.

result = await iodide.file.load("result.json", "json")

because we're not evaluating in an async context. But as you point out, within the file.load function we can store the loaded data to window without the user needing to worry about handling the promise

hamilton commented 5 years ago

Yep, that's correct.

teonbrooks commented 5 years ago

I agree, having a cache function would be useful from the user perspective and would reduce it from a two step save/load to a smarter operation. this is something I'm often dealing with (just ran a bootstrap and would like to show what was done but wouldn't need to do it every time).