Invoke renv to discover library path

kyleam commented 2 years ago

This series fixes gh-392 by asking renv what the library path is. The main change is the last commit, and that has more discussion about possible approaches and downsides. The second commit fixes a Windows bug that's in a code path that this change uses.

I'm marking this a draft because

at this point I'm not sure that this is a direction we want to go. I'd like to hear what others think.
I haven't looked into adding automated tests, though those would of course be good to have if feasible.

dpastoor commented 2 years ago

disclosure - i'm not following this PR too too closely - but one thing that popped into my head around the issue/consideration - was be careful around the renv bootstrapping/presence when thinking through the fallback steps.

In particular - pkgr on a fresh project where renv has not yet been activated/run.

off the top of my head - one heuristic I could imagine would be -->

check if activate.R is present in renv, scan for the renv pinned version, parse and check if version is >= 0.15. boom logic. If this fails, then fallback to shell out.

The other thing to consider is just check carefully which pattern you're using to shell out - there are a couple different invocations of R (side note - I can see you've already done some cleaning - my hope had been to replace them with the rcmd eli worked on that actually has proper contexts/such, so check that out even just for code reference) - you potentially want to act as a user instead of the default isolated mechanism pkgr uses - which would suppress the .Rprofile sourcing. The positive of this, is if you acted as a user, renv should bootstrap itself if it can, and even if its slow would return back a value. The con is whatever side effects - it could accidentally source then from a global renv version rather than the desired version per activate.R, it could be funky around the fact that renv will restart the R session so you might need to handle that restart.

Basically - there are a bunch of states that could wreck havoc if you're going to ultimately ask renv for its version info :-/ Good luck chasing those down. Feel free to me if there are specific states/ideas I can provide input on

kyleam commented 2 years ago

@dpastoor Thanks for the feedback.

was be careful around the renv bootstrapping/presence when thinking through the fallback steps.

Yeah, certainly something to worry about. When I discussed renv's bootstrapping in 1708dd62, I focused on the time penalty, but there should probably have been explicit discussion of different scenarios.

As far as I can tell, there are three groups for non-activated cases:

renv is wired up but not activated (e.g., a fresh clone of a repo that tracks .Rprofile and renv/activate)

This group includes combinations between the system renv states {pre 0.15, post 0.15, not installed} and project renv states {pre 0.15, post 0.15}.

In these six scenarios, an un-activated renv triggers a bootstrap, so pkgr pays that one-time cost, what the user would have paid the first time they loaded R. That buys us the correct answer for the library path, regardless of the various ways that a user can point either pre 0.15 and post 0.15 to a different library path. (Or at least I haven't found the scenario where it doesn't).
renv is not wired up (e.g., a fresh clone of a repo that does not track .Rprofile and renv/activate, instead instructing users to call renv::init(bare = TRUE))

In this case, if pkgr is invoked before the user calls renv::init, no bootstrap is triggered. Instead the library path corresponds to the default of the system renv version. I'd say that gives the best guess at the library location because that's most likely the renv version that would be used if the user were to call renv::init at that moment. It's of course possible for the user to initialize with a different version of renv, but that's not information we can know. If a different renv is used for initialization and it gives a different answer for the library path, the next pkgr action will work with the correct path.
the system doesn't have renv, and renv is not wired up in the project

In this case, pkgr warns about the failing renv call and falls back to the current behavior of installing to $project/renv/library/.... As with the previous case, once renv is wired up, pkgr will start using whatever location it points to.

table of combinations

Here are the different combinations mapped to the above scenarios: | project renv | system renv | scenario | |------------- |------------- |-------- | | pre 0.15 | pre 0.15 | 1 | | pre 0.15 | post 0.15 | 1 | | post 0.15 | pre 0.15 | 1 | | post 0.15 | post 0.15 | 1 | | pre 0.15 | not installed | 1 | | post 0.15 | not installed | 1 | | | | | | not installed | pre 0.15 | 2 | | not installed | post 0.15 | 2 | | | | | | not installed | not installed | 3 |

My tentative takeaway from above is that, assuming we're okay paying the occasional bootstrap cost (which perhaps is the case, given an renv user is going to have to pay this cost at some point), the answer from renv is either correct or as close as we can hope to get given the information at hand.

In particular - pkgr on a fresh project where renv has not yet been activated/run.

off the top of my head - one heuristic I could imagine would be -->

check if activate.R is present in renv, scan for the renv pinned version, parse and check if version is >= 0.15. boom logic. If this fails, then fallback to shell out.

If I'm reading that right, the suggestion is that we use pkgr's current logic rather than invoking renv if we detect that renv < 0.15 will be active. However, even for renv versions before 0.15, invoking renv to get the library path fixes some non-default cases (e.g., if profiles are used). Given the new behavior is here, my feeling is that we might as bite the bullet across the board and get the best answer we can.

you potentially want to act as a user instead of the default isolated mechanism pkgr uses - which would suppress the .Rprofile sourcing.

Right, the call in this PR drops --vanilla when it invokes renv

The positive of this, is if you acted as a user, renv should bootstrap itself if it can, and even if its slow would return back a value.

Yes, in my manual testing it does.

The con is whatever side effects - it could accidentally source then from a global renv version rather than the desired version per activate.R,

Hmm, I'm not following this one. My thinking was that Rscript -e 'renv::... would be the best match for the renv that the user is interacting with (and expects pkgr to be installing packages within).

it could be funky around the fact that renv will restart the R session so you might need to handle that restart.

I think you mean the restart may need to be handled so that the path reported by renv is correct. With various tweaking of renv config knobs, I haven't observed any cases where the returned value looked different from what I expected, but I may be missing something.

dpastoor commented 2 years ago

I think you mean the restart may need to be handled so that the path reported by renv is correct. With various tweaking of renv config knobs, I haven't observed any cases where the returned value looked different from what I expected, but I may be missing something.

I mean from the subprocess perspective - renv should restart your R session after it bootstraps itself so the new session is fully controlled by renv - you'll see in rstudio like "Restarting R...." - i'm not sure if rstudio has to do any wizardry to like catch that and handle any process/PID changes, or if that is gracefully managed. The concern would be --> renv goes to terminate the process and start a new one, and the calling process then exits and you have this zombie process/closed process something + may not get back the appropriate message. This is purely naivety on my part just something that came to mind that felt like a nonstandard process interaction.

Hmm, I'm not following this one. My thinking was that Rscript -e 'renv::... would be the best match for the renv that the user is interacting with (and expects pkgr to be installing packages within).

So the scenario I'm considering, is as you laid out in #2, but looking at it from the perspective of "is that what a user would likely expect to do/be doing, or could this unintentionally do something different they didn't expect"

Person by purpose or accident does not have a .Rprofile (this gets caught with QC review prior to using templates....a non-trivial number of times, it also happens if someone blindly copies/zips a dir that doesn't include over hidden files so .Rprofile falls off accidentally)

Currently, pkgr will happily proceed to install to the renv lib location and leave it up to the user to actually activate renv if they want to use it explicitly by just manually setting the .libPaths() to the library dir before calling init. The new behavior now blends that a little more. So back to the scenario, by dropping --vanilla if renv is present on the system, its still going to return a valid renv value, but just the globally available version on the system, is that going to be the version the user would expect or not. I... am not certain, though I'd lean towards both scenarios coming up.

This of course brings up another option.... just have pkgr error if renv is not yet present and active, by default, and let the user be explicit if they want to proceed anyway

something like

pkgr install --> errors if Type: renv but renv not detected as installed and active or pkgr first forces renv to be bootstrap before proceeding with install pkgr install --no-activate-renv --> let whatever system state just proceed "blindly"

kyleam commented 2 years ago

I mean from the subprocess perspective - [...]

Thanks for clarifying. I don't think a restart is being triggered for the following reasons:

Looking at the renv source, the restart is the front-end specific (renv_restart_request_default doesn't actually trigger a restart, while renv_restart_request_rstudio does). The rstudio variant shouldn't be triggered by the Rscript invocation added in this PR.
I don't see the restart message if I invoke the Rscript command from this PR directly (in a regular terminal, in an RStudio terminal, or from a system() call from an RStudio console.)
After having tested a good number of cases, I don't spot any zombies.

but just the globally available version on the system, is that going to be the version the user would expect or not. I... am not certain, though I'd lean towards both scenarios coming up.

Yeah, I'm of course not certain either, even if (as I said above) I think the system renv behavior is the best guess based on the information we have. While it doesn't seem that a wrong guess is disastrous here (in the sense that the after the user initializes renv, the next pkgr invocation will get the right path), perhaps a guess isn't good enough...

This of course brings up another option.... just have pkgr error if renv is not yet present and active, by default, and let the user be explicit if they want to proceed anyway

... and an error is the way to go.

something like

pkgr install --> errors if Type: renv but renv not detected as installed and active or pkgr first forces renv to be bootstrap before proceeding with install

To guard against scenario 2 and 3 above (i.e. cases where the system renv would be used), it may be sufficient to check that .Rprofile and renv/activate.R are present. It's of course possible for those to be present but edited so that renv isn't actually wired up, but I think in practice it'd be good enough.

Thanks again for the input.

seth127 commented 2 years ago

Good discussion here. My take current take(s) on this, after having caught up and looked through the code:

I agree with this approach, for the reasons given in the 1708dd629b39eb5817148e63a2ac2719d5e71716 commit message, and have no interest in trying to reimplement the renv logic
I don't have a problem with making users potentially incur the couple seconds of bootstrap time (or the time to check with renv for the library path), because as you as say, it will likely be a drop in the bucket of total time and it's a one time cost.
I like the implemented approaches for scenarios 1 and 2 above, but I lean towards erroring on scenario 3, as opposed to falling back to previous behavior (I think that's this code path). Trying to think through this scenario from a user perspective,
- they likely either:
- Didn't mean to have Lockfile: renv in their pkgr.yml
- Meant to have renv (either in the project or system wide)
- I think either way we error with something semi-helpful that says "you have Lockfile: renv in your config but no renv... either install renv or specify a library path for pkgr to install to..."
- Installing to the old default doesn't seem to me like it's likely to set them up for success, especially because (in my mind at least) the most likely next step they will take is to install a fresh (probably >= 0.15) renv, or proceed without it (in which case we just installed to somewhere that's not on their .libPaths(), right?)
As I think about this more, I see some potentially weird issues with scenario 2, where a user installs packages using the system renv path (potentially < 0.15) and then pkgr installs a newer version of renv in the project, and then the users restarts R at some point and renv asks to update and they do, and then it can't find the library that pkgr has previously installed (to the old path). I don't know how likely this is, but I also don't see any great way around it, other than erroring in scenario 2 too, which seems aggressive. I think we leave this as is, hope for the best, and monitor if users start to stumble onto this and ask about it.
I would really like to have some tests for this. I know the entire pkgr test suite has been a pain in various ways but--given we're not under any real time pressure here, and that this is dealing with an external quasi-dependency that may continue to change--I think it's worth the extra time to put some tests in. Ideally some tests that will warn us if renv changes this again.

kyleam commented 2 years ago

I've reworked the series to error in scenario 3 and wired up some tests.

At least on my end, the drone indicator isn't showing up in this pr, so here's the link: https://github-drone.metrumrg.com/metrumresearchgroup/pkgr/223

kyleam commented 2 years ago

@Dreznel Thanks for the review.

I left a few remarks for how I think the tests could be improved.

Good points. Will update.

kyleam commented 2 years ago

Drone build for the latest push is here.

metrumresearchgroup / pkgr

Invoke renv to discover library path #396