Closed kyleam closed 2 years ago
disclosure - i'm not following this PR too too closely - but one thing that popped into my head around the issue/consideration - was be careful around the renv bootstrapping/presence when thinking through the fallback steps.
In particular - pkgr on a fresh project where renv has not yet been activated/run.
off the top of my head - one heuristic I could imagine would be -->
check if activate.R is present in renv, scan for the renv pinned version, parse and check if version is >= 0.15. boom logic. If this fails, then fallback to shell out.
The other thing to consider is just check carefully which pattern you're using to shell out - there are a couple different invocations of R (side note - I can see you've already done some cleaning - my hope had been to replace them with the rcmd eli worked on that actually has proper contexts/such, so check that out even just for code reference) - you potentially want to act as a user instead of the default isolated mechanism pkgr uses - which would suppress the .Rprofile sourcing. The positive of this, is if you acted as a user, renv should bootstrap itself if it can, and even if its slow would return back a value. The con is whatever side effects - it could accidentally source then from a global renv version rather than the desired version per activate.R, it could be funky around the fact that renv will restart the R session so you might need to handle that restart.
Basically - there are a bunch of states that could wreck havoc if you're going to ultimately ask renv for its version info :-/ Good luck chasing those down. Feel free to me if there are specific states/ideas I can provide input on
@dpastoor Thanks for the feedback.
was be careful around the renv bootstrapping/presence when thinking through the fallback steps.
Yeah, certainly something to worry about. When I discussed renv's bootstrapping in 1708dd62, I focused on the time penalty, but there should probably have been explicit discussion of different scenarios.
As far as I can tell, there are three groups for non-activated cases:
renv is wired up but not activated (e.g., a fresh clone of a repo that tracks .Rprofile
and renv/activate
)
This group includes combinations between the system renv states {pre 0.15, post 0.15, not installed}
and project renv states {pre 0.15, post 0.15}
.
In these six scenarios, an un-activated renv triggers a bootstrap, so pkgr pays that one-time cost, what the user would have paid the first time they loaded R. That buys us the correct answer for the library path, regardless of the various ways that a user can point either pre 0.15 and post 0.15 to a different library path. (Or at least I haven't found the scenario where it doesn't).
renv is not wired up (e.g., a fresh clone of a repo that does not track .Rprofile
and renv/activate
, instead instructing users to call renv::init(bare = TRUE)
)
In this case, if pkgr is invoked before the user calls renv::init
, no bootstrap is triggered. Instead the library path corresponds to the default of the system renv version. I'd say that gives the best guess at the library location because that's most likely the renv version that would be used if the user were to call renv::init
at that moment. It's of course possible for the user to initialize with a different version of renv
, but that's not information we can know. If a different renv is used for initialization and it gives a different answer for the library path, the next pkgr action will work with the correct path.
the system doesn't have renv, and renv is not wired up in the project
In this case, pkgr warns about the failing renv call and falls back to the current behavior of installing to $project/renv/library/...
. As with the previous case, once renv is wired up, pkgr will start using whatever location it points to.
My tentative takeaway from above is that, assuming we're okay paying the occasional bootstrap cost (which perhaps is the case, given an renv user is going to have to pay this cost at some point), the answer from renv is either correct or as close as we can hope to get given the information at hand.
In particular - pkgr on a fresh project where renv has not yet been activated/run.
off the top of my head - one heuristic I could imagine would be -->
check if activate.R is present in renv, scan for the renv pinned version, parse and check if version is >= 0.15. boom logic. If this fails, then fallback to shell out.
If I'm reading that right, the suggestion is that we use pkgr's current logic rather than invoking renv if we detect that renv < 0.15 will be active. However, even for renv versions before 0.15, invoking renv to get the library path fixes some non-default cases (e.g., if profiles are used). Given the new behavior is here, my feeling is that we might as bite the bullet across the board and get the best answer we can.
you potentially want to act as a user instead of the default isolated mechanism pkgr uses - which would suppress the .Rprofile sourcing.
Right, the call in this PR drops --vanilla
when it invokes renv
The positive of this, is if you acted as a user, renv should bootstrap itself if it can, and even if its slow would return back a value.
Yes, in my manual testing it does.
The con is whatever side effects - it could accidentally source then from a global renv version rather than the desired version per activate.R,
Hmm, I'm not following this one. My thinking was that Rscript -e 'renv::...
would be the best match for the renv that the user is interacting with (and expects pkgr to be installing packages within).
it could be funky around the fact that renv will restart the R session so you might need to handle that restart.
I think you mean the restart may need to be handled so that the path reported by renv is correct. With various tweaking of renv config knobs, I haven't observed any cases where the returned value looked different from what I expected, but I may be missing something.
I think you mean the restart may need to be handled so that the path reported by renv is correct. With various tweaking of renv config knobs, I haven't observed any cases where the returned value looked different from what I expected, but I may be missing something.
I mean from the subprocess perspective - renv should restart your R session after it bootstraps itself so the new session is fully controlled by renv - you'll see in rstudio like "Restarting R...." - i'm not sure if rstudio has to do any wizardry to like catch that and handle any process/PID changes, or if that is gracefully managed. The concern would be --> renv goes to terminate the process and start a new one, and the calling process then exits and you have this zombie process/closed process something + may not get back the appropriate message. This is purely naivety on my part just something that came to mind that felt like a nonstandard process interaction.
Hmm, I'm not following this one. My thinking was that Rscript -e 'renv::... would be the best match for the renv that the user is interacting with (and expects pkgr to be installing packages within).
So the scenario I'm considering, is as you laid out in #2, but looking at it from the perspective of "is that what a user would likely expect to do/be doing, or could this unintentionally do something different they didn't expect"
Person by purpose or accident does not have a .Rprofile
(this gets caught with QC review prior to using templates....a non-trivial number of times, it also happens if someone blindly copies/zips a dir that doesn't include over hidden files so .Rprofile falls off accidentally)
Currently, pkgr will happily proceed to install to the renv lib location and leave it up to the user to actually activate renv if they want to use it explicitly by just manually setting the .libPaths() to the library dir before calling init. The new behavior now blends that a little more. So back to the scenario, by dropping --vanilla
if renv is present on the system, its still going to return a valid renv value, but just the globally available version on the system, is that going to be the version the user would expect or not. I... am not certain, though I'd lean towards both scenarios coming up.
This of course brings up another option.... just have pkgr error if renv is not yet present and active, by default, and let the user be explicit if they want to proceed anyway
something like
pkgr install
--> errors if Type: renv but renv not detected as installed and active or pkgr first forces renv to be bootstrap before proceeding with install
pkgr install --no-activate-renv
--> let whatever system state just proceed "blindly"
I mean from the subprocess perspective - [...]
Thanks for clarifying. I don't think a restart is being triggered for the following reasons:
Looking at the renv source, the restart is the front-end specific (renv_restart_request_default
doesn't actually trigger a restart, while renv_restart_request_rstudio
does). The rstudio variant shouldn't be triggered by the Rscript invocation added in this PR.
I don't see the restart message if I invoke the Rscript command from this PR directly (in a regular terminal, in an RStudio terminal, or from a system()
call from an RStudio console.)
After having tested a good number of cases, I don't spot any zombies.
but just the globally available version on the system, is that going to be the version the user would expect or not. I... am not certain, though I'd lean towards both scenarios coming up.
Yeah, I'm of course not certain either, even if (as I said above) I think the system renv behavior is the best guess based on the information we have. While it doesn't seem that a wrong guess is disastrous here (in the sense that the after the user initializes renv, the next pkgr invocation will get the right path), perhaps a guess isn't good enough...
This of course brings up another option.... just have pkgr error if renv is not yet present and active, by default, and let the user be explicit if they want to proceed anyway
... and an error is the way to go.
something like
pkgr install
--> errors if Type: renv but renv not detected as installed and active or pkgr first forces renv to be bootstrap before proceeding with install
To guard against scenario 2 and 3 above (i.e. cases where the system renv would be used), it may be sufficient to check that .Rprofile
and renv/activate.R
are present. It's of course possible for those to be present but edited so that renv isn't actually wired up, but I think in practice it'd be good enough.
Thanks again for the input.
Good discussion here. My take current take(s) on this, after having caught up and looked through the code:
renv
for the library path), because as you as say, it will likely be a drop in the bucket of total time and it's a one time cost.Lockfile: renv
in their pkgr.yml
renv
(either in the project or system wide)Lockfile: renv
in your config but no renv... either install renv or specify a library path for pkgr to install to...">= 0.15
) renv, or proceed without it (in which case we just installed to somewhere that's not on their .libPaths()
, right?)< 0.15
) and then pkgr installs a newer version of renv in the project, and then the users restarts R at some point and renv asks to update and they do, and then it can't find the library that pkgr has previously installed (to the old path). I don't know how likely this is, but I also don't see any great way around it, other than erroring in scenario 2 too, which seems aggressive. I think we leave this as is, hope for the best, and monitor if users start to stumble onto this and ask about it.I've reworked the series to error in scenario 3 and wired up some tests.
At least on my end, the drone indicator isn't showing up in this pr, so here's the link: https://github-drone.metrumrg.com/metrumresearchgroup/pkgr/223
@Dreznel Thanks for the review.
I left a few remarks for how I think the tests could be improved.
Good points. Will update.
This series fixes gh-392 by asking renv what the library path is. The main change is the last commit, and that has more discussion about possible approaches and downsides. The second commit fixes a Windows bug that's in a code path that this change uses.
I'm marking this a draft because
at this point I'm not sure that this is a direction we want to go. I'd like to hear what others think.
I haven't looked into adding automated tests, though those would of course be good to have if feasible.