metrumresearchgroup / pkgr

R package installation and management - reimagined.
https://metrumresearchgroup.github.io/pkgr/docs
39 stars 4 forks source link

Invoke renv to discover library path #396

Closed kyleam closed 2 years ago

kyleam commented 2 years ago

This series fixes gh-392 by asking renv what the library path is. The main change is the last commit, and that has more discussion about possible approaches and downsides. The second commit fixes a Windows bug that's in a code path that this change uses.

I'm marking this a draft because

dpastoor commented 2 years ago

disclosure - i'm not following this PR too too closely - but one thing that popped into my head around the issue/consideration - was be careful around the renv bootstrapping/presence when thinking through the fallback steps.

In particular - pkgr on a fresh project where renv has not yet been activated/run.

off the top of my head - one heuristic I could imagine would be -->

check if activate.R is present in renv, scan for the renv pinned version, parse and check if version is >= 0.15. boom logic. If this fails, then fallback to shell out.

The other thing to consider is just check carefully which pattern you're using to shell out - there are a couple different invocations of R (side note - I can see you've already done some cleaning - my hope had been to replace them with the rcmd eli worked on that actually has proper contexts/such, so check that out even just for code reference) - you potentially want to act as a user instead of the default isolated mechanism pkgr uses - which would suppress the .Rprofile sourcing. The positive of this, is if you acted as a user, renv should bootstrap itself if it can, and even if its slow would return back a value. The con is whatever side effects - it could accidentally source then from a global renv version rather than the desired version per activate.R, it could be funky around the fact that renv will restart the R session so you might need to handle that restart.

Basically - there are a bunch of states that could wreck havoc if you're going to ultimately ask renv for its version info :-/ Good luck chasing those down. Feel free to me if there are specific states/ideas I can provide input on

kyleam commented 2 years ago

@dpastoor Thanks for the feedback.

was be careful around the renv bootstrapping/presence when thinking through the fallback steps.

Yeah, certainly something to worry about. When I discussed renv's bootstrapping in 1708dd62, I focused on the time penalty, but there should probably have been explicit discussion of different scenarios.

As far as I can tell, there are three groups for non-activated cases:

  1. renv is wired up but not activated (e.g., a fresh clone of a repo that tracks .Rprofile and renv/activate)

    This group includes combinations between the system renv states {pre 0.15, post 0.15, not installed} and project renv states {pre 0.15, post 0.15}.

    In these six scenarios, an un-activated renv triggers a bootstrap, so pkgr pays that one-time cost, what the user would have paid the first time they loaded R. That buys us the correct answer for the library path, regardless of the various ways that a user can point either pre 0.15 and post 0.15 to a different library path. (Or at least I haven't found the scenario where it doesn't).

  2. renv is not wired up (e.g., a fresh clone of a repo that does not track .Rprofile and renv/activate, instead instructing users to call renv::init(bare = TRUE))

    In this case, if pkgr is invoked before the user calls renv::init, no bootstrap is triggered. Instead the library path corresponds to the default of the system renv version. I'd say that gives the best guess at the library location because that's most likely the renv version that would be used if the user were to call renv::init at that moment. It's of course possible for the user to initialize with a different version of renv, but that's not information we can know. If a different renv is used for initialization and it gives a different answer for the library path, the next pkgr action will work with the correct path.

  3. the system doesn't have renv, and renv is not wired up in the project

    In this case, pkgr warns about the failing renv call and falls back to the current behavior of installing to $project/renv/library/.... As with the previous case, once renv is wired up, pkgr will start using whatever location it points to.

table of combinations Here are the different combinations mapped to the above scenarios: | project renv | system renv | scenario | |------------- |------------- |-------- | | pre 0.15 | pre 0.15 | 1 | | pre 0.15 | post 0.15 | 1 | | post 0.15 | pre 0.15 | 1 | | post 0.15 | post 0.15 | 1 | | pre 0.15 | not installed | 1 | | post 0.15 | not installed | 1 | | | | | | not installed | pre 0.15 | 2 | | not installed | post 0.15 | 2 | | | | | | not installed | not installed | 3 |

My tentative takeaway from above is that, assuming we're okay paying the occasional bootstrap cost (which perhaps is the case, given an renv user is going to have to pay this cost at some point), the answer from renv is either correct or as close as we can hope to get given the information at hand.

In particular - pkgr on a fresh project where renv has not yet been activated/run.

off the top of my head - one heuristic I could imagine would be -->

check if activate.R is present in renv, scan for the renv pinned version, parse and check if version is >= 0.15. boom logic. If this fails, then fallback to shell out.

If I'm reading that right, the suggestion is that we use pkgr's current logic rather than invoking renv if we detect that renv < 0.15 will be active. However, even for renv versions before 0.15, invoking renv to get the library path fixes some non-default cases (e.g., if profiles are used). Given the new behavior is here, my feeling is that we might as bite the bullet across the board and get the best answer we can.

you potentially want to act as a user instead of the default isolated mechanism pkgr uses - which would suppress the .Rprofile sourcing.

Right, the call in this PR drops --vanilla when it invokes renv

The positive of this, is if you acted as a user, renv should bootstrap itself if it can, and even if its slow would return back a value.

Yes, in my manual testing it does.

The con is whatever side effects - it could accidentally source then from a global renv version rather than the desired version per activate.R,

Hmm, I'm not following this one. My thinking was that Rscript -e 'renv::... would be the best match for the renv that the user is interacting with (and expects pkgr to be installing packages within).

it could be funky around the fact that renv will restart the R session so you might need to handle that restart.

I think you mean the restart may need to be handled so that the path reported by renv is correct. With various tweaking of renv config knobs, I haven't observed any cases where the returned value looked different from what I expected, but I may be missing something.

dpastoor commented 2 years ago

I think you mean the restart may need to be handled so that the path reported by renv is correct. With various tweaking of renv config knobs, I haven't observed any cases where the returned value looked different from what I expected, but I may be missing something.

I mean from the subprocess perspective - renv should restart your R session after it bootstraps itself so the new session is fully controlled by renv - you'll see in rstudio like "Restarting R...." - i'm not sure if rstudio has to do any wizardry to like catch that and handle any process/PID changes, or if that is gracefully managed. The concern would be --> renv goes to terminate the process and start a new one, and the calling process then exits and you have this zombie process/closed process something + may not get back the appropriate message. This is purely naivety on my part just something that came to mind that felt like a nonstandard process interaction.

Hmm, I'm not following this one. My thinking was that Rscript -e 'renv::... would be the best match for the renv that the user is interacting with (and expects pkgr to be installing packages within).

So the scenario I'm considering, is as you laid out in #2, but looking at it from the perspective of "is that what a user would likely expect to do/be doing, or could this unintentionally do something different they didn't expect"

Person by purpose or accident does not have a .Rprofile (this gets caught with QC review prior to using templates....a non-trivial number of times, it also happens if someone blindly copies/zips a dir that doesn't include over hidden files so .Rprofile falls off accidentally)

Currently, pkgr will happily proceed to install to the renv lib location and leave it up to the user to actually activate renv if they want to use it explicitly by just manually setting the .libPaths() to the library dir before calling init. The new behavior now blends that a little more. So back to the scenario, by dropping --vanilla if renv is present on the system, its still going to return a valid renv value, but just the globally available version on the system, is that going to be the version the user would expect or not. I... am not certain, though I'd lean towards both scenarios coming up.

This of course brings up another option.... just have pkgr error if renv is not yet present and active, by default, and let the user be explicit if they want to proceed anyway

something like

pkgr install --> errors if Type: renv but renv not detected as installed and active or pkgr first forces renv to be bootstrap before proceeding with install pkgr install --no-activate-renv --> let whatever system state just proceed "blindly"

kyleam commented 2 years ago

I mean from the subprocess perspective - [...]

Thanks for clarifying. I don't think a restart is being triggered for the following reasons:

but just the globally available version on the system, is that going to be the version the user would expect or not. I... am not certain, though I'd lean towards both scenarios coming up.

Yeah, I'm of course not certain either, even if (as I said above) I think the system renv behavior is the best guess based on the information we have. While it doesn't seem that a wrong guess is disastrous here (in the sense that the after the user initializes renv, the next pkgr invocation will get the right path), perhaps a guess isn't good enough...

This of course brings up another option.... just have pkgr error if renv is not yet present and active, by default, and let the user be explicit if they want to proceed anyway

... and an error is the way to go.

something like

pkgr install --> errors if Type: renv but renv not detected as installed and active or pkgr first forces renv to be bootstrap before proceeding with install

To guard against scenario 2 and 3 above (i.e. cases where the system renv would be used), it may be sufficient to check that .Rprofile and renv/activate.R are present. It's of course possible for those to be present but edited so that renv isn't actually wired up, but I think in practice it'd be good enough.

Thanks again for the input.

seth127 commented 2 years ago

Good discussion here. My take current take(s) on this, after having caught up and looked through the code:

kyleam commented 2 years ago

I've reworked the series to error in scenario 3 and wired up some tests.

At least on my end, the drone indicator isn't showing up in this pr, so here's the link: https://github-drone.metrumrg.com/metrumresearchgroup/pkgr/223

kyleam commented 2 years ago

@Dreznel Thanks for the review.

I left a few remarks for how I think the tests could be improved.

Good points. Will update.

kyleam commented 2 years ago

Drone build for the latest push is here.