Closed lorenzwalthert closed 3 years ago
With the current changes made in renv
master via rstudio/renv#554, we can now create a lockfile like this:
crated <- carrier::crate(function(x) styler::style_text('1'))
fun <- eval(parse(text = as.character(attr(crated, 'srcref'))))
deps <- renv::dependencies(fun)
renv::snapshot(packages = deps$Package)
Now the question is if there should be a convenience function in carrier to capsule the last 3 lines like proposed above:
burry <- function(crated) {
fun <- eval(parse(text = as.character(attr(crated, 'srcref'))))
deps <- renv::dependencies(fun)
renv::snapshot(packages = deps$Package)
}
What is the reason for parsing and evaluating srcrefs? As a general principle no computation should be done on the srcrefs because they are optional and for debugging only. Also because of meta-programming they might not reflect the underlying function, e.g. with expr(function() { !!foo })
.
True @lionel-. Hehe I could not do it better than this, I know it's very clumsy. Alternatively just this?
fun <- unclass(crated)
deps <- renv::dependencies(fun)
renv::snapshot(packages = deps$Package)
Also I am very unexperienced with srcrefs and did not understand why you did not use a plain character vector to store the code but source refs instead. Your points about computation make sense.
How would you solve the problem at hand (I hope the goal is clear)? Thing is that we must pass a bare function renv::dependencies()
for this to work with the implementation Kevin suggested in rstudio/renv#554. Because if the first argument to dependnecies()
is a string, the function will assume it is a file/directory path. This implementation in renv also only works for functions, not for arbitrary code snippets (not an issue for us, but maybe other people also want to create a renv lockfile from arbitrary code). So maybe it would be better to have something like renv::dependencies(code = x)
with x
being a character vector of code or srcref or something else.
An alternative is to write crated code to a tempfile and then use dependencies('path/to/file/')
and remove the file, which is not very elegant.
Also I am very unexperienced with srcrefs and did not understand why you did not use a plain character vector to store the code but source refs instead.
I'm not sure I understand. Carrier doesn't do anything special with srcrefs, they are created by the parser.
Regarding dependencies()
, would it help if crate functions explicitly inherited from "function"
? Explicit inheritance to base type is something I didn't use to do consistently in the past.
No longer on my computer, but when I tried dependencies(crate(foo))
it seemed to work.
You are right, I have not even tried dependencies(crate(foo))
. I think it works for both ways of creating crated functions. Do you think it still makes sense for carrier
to provide a wrapper to save a lockfile or not? Because with the changes in renv
, it's really as simple as
fun <- carrier::crate(~styler::style_text(.x))
deps <- renv::dependencies(fun)
renv::snapshot(packages = deps$Package)
The benefit of a wrapper (e.g. burry()
) to do the two last lines above would hence only be that people don't have to figure out how to use renv
in conjunction with carrier
. It's just two lines, but might take some time for users to figure it out. Alternative is to mention how to do it in the docs of crate
. Or leave it as is. Up to you.
I'm not sure crates are the right level to manage dependencies and versions because then you might have incompatible crates.
But also I've never used renv or deployed R or any of these things so I might be wrong!
I think you're right @lionel-. Specifically for the case of deploying models, we are finding that a reliable strategy is to use a tool like bundle to capture a model together with its references and then a tool like renv to capture dependencies and versions. What vetiver does is provide a higher level interface that will take care of it all.
What we say in the bundle docs is:
The bundle package wraps native serialization methods from model-supplying packages. Between versions, those model-supplying packages may change their native serialization methods, possibly introducing problems with re-loading objects serialized with previous package versions. The bundle package does not provide checks for these sorts of changes, and ought to be used in conjunction with tooling for managing and monitoring model environments like vetiver or renv.
Sounds right to me, thanks for the explanations.
While carrier captures some dependencies, it's limited in the sense that it does not capture package and R versions. This might be a very crucial part in reproducibility as well as when using
crate
functions in a deployment context but also when sending these functions to a remote worker. A potential application is the deployment ofcrate
functions inmlflow
, as partly discussed in these issues. At the moment, R deployment with AWS or Azure ML is not supported, the Python deployment method rely onconda
for dependency management. I think it would be great if there was a way to create arenv
environment from a crated function, bridging the gap between renv and carrier. E.g. like thisWhere
burry()
would save arenv.lock
(or set up all required files/directories to later userenv::restore()
) derived fromfn
and the current library (potentially already managed by renv). I am not sure there are some internals fromrenv
that had to be exported from the package to leveragerenv
to discover the dependencies in the crated functions and write the lockfile or if this issue should rather go to rstudio/renv.cc: @mdneuzerling