rstudio / renv

renv: Project environments for R.
https://rstudio.github.io/renv/
MIT License
1.02k stars 155 forks source link

`renv::install()` with explicit dependencies should prioritize remote ref #736

Closed gadenbuie closed 1 year ago

gadenbuie commented 3 years ago

This is a complicated issue to describe, so I've created a companion repository.

In essence, we are using explicit snapshot types with a DESCRIPTION file. Some of the packages listed in the DESCRIPTION file require package versions hosted on GitHub. Because references such as @main or @master may move, we specifically reference remote package versions by commit SHA, such as:

Imports:
    htmltools,
    learnr,
    parsons,
    sortable
Remotes: 
    rstudio/htmltools@11cfbf3b331387922c0144203cf6e947c8bdaee9,
    rstudio/learnr@428d9af1506447f68890724bfc619ed1f2e7cfc0,
    rstudio/parsons@42d0bfb8f23a3c8048cc4c1fbca95a0ef1e0bca3,
    rstudio/sortable@cdde09d915ee0bc73353b079c7afbed8cb70ab4a

However, we're finding that renv::install() doesn't always prioritize the version listed in Remotes above all other versions. In the case of this reprex, sortable is installed at a different commit.

# GitHub =============================
- htmltools     [* -> rstudio/htmltools@11cfbf3b331387922c0144203cf6e947c8bdaee9]
- learnr        [* -> rstudio/learnr@428d9af1506447f68890724bfc619ed1f2e7cfc0]
- parsons       [* -> rstudio/parsons@42d0bfb8f23a3c8048cc4c1fbca95a0ef1e0bca3]
- sortable      [* -> rstudio/sortable]

The companion includes a number of things that I've tried to force installation of the package at the specific reference, but my attempts invariably arrive at a place where a different version of one of the remote packages is installed.

kevinushey commented 3 years ago

I wonder if this is because parsons also declares a Remotes: field here:

https://github.com/rstudio/parsons/blob/master/DESCRIPTION#L29-L31

and that one is taking precedence over the one in the project's DESCRIPTION file.

kevinushey commented 3 years ago

The renv code that I believe is responsible for handling these remotes lives here:

https://github.com/rstudio/renv/blob/c145e9f357fc401b5302cba503739edc89cec6a9/R/retrieve.R#L760-L804

Perhaps we should only update the remote if it's not already set.

gadenbuie commented 3 years ago

Thanks @kevinushey this is working nicely except for one small problem. Since upgrading to this version of renv, we are now seeing extra Remotes fields being added to the renv.lock file, as in the following example entry for gradethis:

    "gradethis": {
      "Package": "gradethis",
      "Version": "0.2.3.9001",
      "Source": "GitHub",
      "RemoteType": "github",
      "RemoteHost": "api.github.com",
      "RemoteUsername": "rstudio",
      "RemoteRepo": "gradethis",
      "RemoteRef": "ae39493cbe2a9154e699024c379e0134d72a961f",
      "RemoteSha": "ae39493cbe2a9154e699024c379e0134d72a961f",
      "Remotes": "rstudio/learnr",
      "Remotes.1": "rstudio/learnr",
      "Hash": "3ffe9681a450c8062d102fb866f28021"
    },

The additional fields show up as Remotes.1 or Remotes.2. I haven't been able to isolate the behavior into a minimal reprex, but we cycle through renv::restore(), renv::install(), renv::snapshot() in a CI process and are noticing that new Remotes.N fields are being added even if the renv.lock hasn't changed (but maybe the package cache has?).

kevinushey commented 3 years ago

Okay, let me know when you have a reproducible example.

dvg-p4 commented 1 year ago

I'm also seeing this strange "Remotes.1" behavior--it's unfortunately with a private repo, so I'll have to see if I can get a repro on some public ones.

kevinushey commented 1 year ago

Thanks @dvg-p4 -- please let me know if you can share a reproducible example.

dvg-p4 commented 1 year ago

Reproduction of overwriting installs in R4 or R3

Setup

I created some dummy github package repos: innerPkg, which exports a single function, and outerPkg, which imports and uses innerPkg. Specifically, the DESCRIPTION file of outerPkg has:

Imports: 
    innerPkg (>= 0.1.0)
Remotes: 
    dvg-p4/innerPkg

outerPkg has a single release, 0.1.0, tagged. innerPkg has an 0.1.1 release tagged, but also has a further development commit on master, with DESCRIPTION updated to 0.1.1.9000 -- AFAICT this is reasonably standard practice for R package development.

Reproduction

Create a new R project using renv. Mine has the standard R studio boilerplate, plus an "analysis script" file: image

my_script.R:

library(outerPkg)
outer_function()

Now, as expected, renv::status() yields:

The following package(s) are used in the project, but are not installed:
    outerPkg

So we go ahead and install it:

> renv::install("dvg-p4/outerPkg")
Retrieving 'https://api.github.com/repos/dvg-p4/outerPkg/tarball/4897d882ca3e3ecb5f908de9e5ffc9db6d6070b9' ...
    OK [downloaded 2 Kb in 0.4 secs]
Retrieving 'https://api.github.com/repos/dvg-p4/innerPkg/tarball/9397494e8c0e54c694585c9cfc1e6fceb945e07f' ...
    OK [downloaded 2 Kb in 0.3 secs]
Installing innerPkg [0.1.1.9000] ...
    OK [built from source]
Moving innerPkg [0.1.1.9000] into the cache ...
    OK [moved to cache in 0.64 milliseconds]
Installing outerPkg [0.1.0] ...
    OK [built from source]
Moving outerPkg [0.1.0] into the cache ...
    OK [moved to cache in 0.6 milliseconds]

However, we notice this seems to have installed a weird dev version of innerPkg, so we decide to be more explicit about release tags:

> renv::install("dvg-p4/innerPkg@0.1.1")
Retrieving 'https://api.github.com/repos/dvg-p4/innerPkg/tarball/04ac8b33e4bfbb59a793ed54bb031a8d7c63805c' ...
    OK [downloaded 2 Kb in 0.3 secs]
Installing innerPkg [0.1.1] ...
    OK [built from source]
Moving innerPkg [0.1.1] into the cache ...
    OK [moved to cache in 0.66 milliseconds]
> renv::status()
The following package(s) are installed but not recorded in the lockfile:
  outerPkg   [dvg-p4/outerPkg]
  innerPkg   [dvg-p4/innerPkg@0.1.1]
Use `renv::snapshot()` to add these packages to your lockfile.

Before we do so, let's explicitly tag the version of outerPkg as well:

> renv::install("dvg-p4/outerPkg@0.1.0")
Retrieving 'https://api.github.com/repos/dvg-p4/outerPkg/tarball/4897d882ca3e3ecb5f908de9e5ffc9db6d6070b9' ...
    OK [downloaded 2 Kb in 0.3 secs]
Installing innerPkg [0.1.1.9000] ...
    OK [linked cache]
Installing outerPkg [0.1.0] ...
    OK [built from source]
Moving outerPkg [0.1.0] into the cache ...
    OK [moved to cache in 0.62 milliseconds]
> renv::status()
The following package(s) are installed but not recorded in the lockfile:
  outerPkg   [dvg-p4/outerPkg@0.1.0]
  innerPkg   [dvg-p4/innerPkg]

This seems to be the core of the issue. Expected behavior: renv::install() does not overwrite current compatible installations of packages Observed behavior: renv::install() sometimes reinstalls the most recent commit on master of innerPkg when installing outerPkg Workaround: install the specific tag of innerPkg after installing outerPkg:

> renv::install("dvg-p4/innerPkg@0.1.1")
Installing innerPkg [0.1.1] ...
    OK [linked cache]
> renv::status()
The following package(s) are installed but not recorded in the lockfile:
  outerPkg   [dvg-p4/outerPkg@0.1.0]
  innerPkg   [dvg-p4/innerPkg@0.1.1]
Use `renv::snapshot()` to add these packages to your lockfile.
> renv::snapshot()
The following package(s) will be updated in the lockfile:
# GitHub =============================
- innerPkg   [* -> dvg-p4/innerPkg@0.1.1]
- outerPkg   [* -> dvg-p4/outerPkg@0.1.0]
Do you want to proceed? [y/N]: y
* Lockfile written to '~/scratchpad/renv_dev_reprex/renv.lock'.

This creates a valid-looking lockfile:

...
"Packages": {
    "innerPkg": {
      "Package": "innerPkg",
      "Version": "0.1.1",
      "Source": "GitHub",
      "RemoteType": "github",
      "RemoteHost": "api.github.com",
      "RemoteUsername": "dvg-p4",
      "RemoteRepo": "innerPkg",
      "RemoteRef": "0.1.1",
      "RemoteSha": "04ac8b33e4bfbb59a793ed54bb031a8d7c63805c",
      "Hash": "6540c97eb55d6fc575127d2c51b547f4",
      "Requirements": []
    },
    "outerPkg": {
      "Package": "outerPkg",
      "Version": "0.1.0",
      "Source": "GitHub",
      "RemoteType": "github",
      "RemoteHost": "api.github.com",
      "RemoteUsername": "dvg-p4",
      "RemoteRepo": "outerPkg",
      "RemoteRef": "0.1.0",
      "RemoteSha": "4897d882ca3e3ecb5f908de9e5ffc9db6d6070b9",
      "Remotes": "dvg-p4/innerPkg",
      "Hash": "9715a0717d6d3f577fc576749cb66144",
      "Requirements": [
        "innerPkg"
      ]
    },
...

And our status and actually-installed-packages look good as well.

> renv::status()
* The project is already synchronized with the lockfile.
> source("my_script.R")
[1] "This is outer_function()"
[1] "This is inner_function()"
[1] "You gave me"
[1] "hello from outer_function!"
[1] "This is the beta version 0.1.x of innerPkg!"
[1] "Outer function out."

If you're using R4, this workaround is sufficient--any further restore()s should work as expected. In R3, however, there will be difficulty when you try to restore from a clean install, see below.

dvg-p4 commented 1 year ago

Reproduction of "Remotes.1" behavior (R3)

Create a minimal project with renv and a call to outerPkg::outer_function() as above. Then:

> renv::install("dvg-p4/outerPkg@*release")
Retrieving 'https://api.github.com/repos/dvg-p4/outerPkg/tarball/4897d882ca3e3ecb5f908de9e5ffc9db6d6070b9' ...
    OK [downloaded 2 Kb in 0.3 secs]
Retrieving 'https://api.github.com/repos/dvg-p4/innerPkg/tarball/9397494e8c0e54c694585c9cfc1e6fceb945e07f' ...
    OK [downloaded 2 Kb in 0.3 secs]
Installing innerPkg [0.1.1.9000] ...
    OK [built from source]
Moving innerPkg [0.1.1.9000] into the cache ...
    OK [moved to cache in 0.99 milliseconds]
Installing outerPkg [0.1.0] ...
    OK [built from source]
Moving outerPkg [0.1.0] into the cache ...
    OK [moved to cache in 0.78 milliseconds]
> renv::status()
The following package(s) are installed but not recorded in the lockfile:
           _
  outerPkg   [dvg-p4/outerPkg@0.1.0]
  innerPkg   [dvg-p4/innerPkg]

Use `renv::snapshot()` to add these packages to your lockfile.

> renv::install("dvg-p4/innerPkg@*release")
Installing innerPkg [0.1.1] ...
    OK [linked cache]
> renv::status()
The following package(s) are installed but not recorded in the lockfile:
           _
  outerPkg   [dvg-p4/outerPkg@0.1.0]
  innerPkg   [dvg-p4/innerPkg@0.1.1]

Use `renv::snapshot()` to add these packages to your lockfile.

> renv::snapshot()
The following package(s) will be updated in the lockfile:

# GitHub =============================
- innerPkg   [* -> dvg-p4/innerPkg@0.1.1]
- outerPkg   [* -> dvg-p4/outerPkg@0.1.0]

Do you want to proceed? [y/N]: y
* Lockfile written to '~/scratchpad/renv_r3/renv.lock'.
> renv::status()
* The project is already synchronized with the lockfile.

As expected so far. Next, to simulate a fresh installation, delete the renv directory, and purge the cache:

> renv::purge("outerPkg")
The following packages will be purged from the cache:

    outerPkg 0.1.0 [Hash: 9715a0717d6d3f577fc576749cb66144]

Do you want to proceed? [y/N]: y
* Removed 1 package from the cache.
> renv::purge("innerPkg")
The following packages will be purged from the cache:

    innerPkg 0.1.1.9000 [Hash: 35c7f820c451d3895e8773441752fe61]
    innerPkg 0.1.1      [Hash: 6540c97eb55d6fc575127d2c51b547f4]

Do you want to proceed? [y/N]: y
* Removed 2 packages from the cache.

Restart R, then attempt to restore the environment:

> renv::activate()
* Project '~/scratchpad/renv_r3' loaded. [renv 0.16.0]
* The project library is out of sync with the lockfile.
* Use `renv::restore()` to install packages recorded in the lockfile.
> renv::restore()
The following package(s) will be updated:

# GitHub =============================
- innerPkg   [* -> dvg-p4/innerPkg@0.1.1]
- outerPkg   [* -> dvg-p4/outerPkg@0.1.0]

Do you want to proceed? [y/N]: y
Retrieving 'https://api.github.com/repos/dvg-p4/innerPkg/tarball/04ac8b33e4bfbb59a793ed54bb031a8d7c63805c' ...
    OK [downloaded 2 Kb in 0.4 secs]
Retrieving 'https://api.github.com/repos/dvg-p4/outerPkg/tarball/4897d882ca3e3ecb5f908de9e5ffc9db6d6070b9' ...
    OK [downloaded 2 Kb in 0.3 secs]
Installing innerPkg [0.1.1] ...
    OK [built from source]
Moving innerPkg [0.1.1] into the cache ...
    OK [moved to cache in 0.89 milliseconds]
Installing outerPkg [0.1.0] ...
    OK [built from source]
Moving outerPkg [0.1.0] into the cache ...
    OK [moved to cache in 0.72 milliseconds]

Looks good so far but:

> renv::status()
The following package(s) are out of sync:

   Package   Lockfile Version   Library Version
  outerPkg              0.1.0             0.1.0

Use `renv::snapshot()` to save the state of your library to the lockfile.
Use `renv::restore()` to restore your library from the lockfile.

Expected behavior: A package that is the correct version from the same git reference will not display as out of sync with itself Observed behavior: That message, and (when I attempt to resolve with another restore():

> renv::restore()
The following package(s) will be updated:

# GitHub =============================
- outerPkg   [dvg-p4/outerPkg@0.1.0: unchanged]

Do you want to proceed? [y/N]: y
Installing innerPkg [0.1.1.9000] ...
    OK [linked cache]
Installing outerPkg [0.1.0] ...
    OK [linked cache]
The dependency tree was repaired during package installation:
           _
  innerPkg   [dvg-p4/innerPkg]

Call `renv::snapshot()` to capture these dependencies in the lockfile.

> renv::status()
The following package(s) are out of sync:

   Package   Lockfile Version   Library Version
  innerPkg              0.1.1        0.1.1.9000
  outerPkg              0.1.0             0.1.0

Use `renv::snapshot()` to save the state of your library to the lockfile.
Use `renv::restore()` to restore your library from the lockfile.

The development version of innerPkg has been installed again! That's not what we want, what happens with another restore()?

> renv::restore()
The following package(s) will be updated:
# GitHub =============================
- innerPkg   [ver: 0.1.1.9000 -> 0.1.1; ref: master -> 0.1.1; sha: 9397494e -> 04ac8b33]
- outerPkg   [dvg-p4/outerPkg@0.1.0: unchanged]
Do you want to proceed? [y/N]: y
Installing innerPkg [0.1.1] ...
    OK [linked cache]
Installing outerPkg [0.1.0] ...
    OK [linked cache]
> renv::status()
The following package(s) are out of sync:
   Package   Lockfile Version   Library Version
  outerPkg              0.1.0             0.1.0
Use `renv::snapshot()` to save the state of your library to the lockfile.
Use `renv::restore()` to restore your library from the lockfile.

We're back to the state we were in before.

What if we try to snapshot?

> renv::snapshot()
The following package(s) will be updated in the lockfile:

# GitHub =============================
- outerPkg   [dvg-p4/outerPkg@0.1.0: unchanged]

Do you want to proceed? [y/N]: y
* Lockfile written to '~/scratchpad/renv_r3/renv.lock'.
> renv::status()
* The project is already synchronized with the lockfile.

Looks good, but if we inspect the lockfile:

"innerPkg": {
      "Package": "innerPkg",
      "Version": "0.1.1",
      "Source": "GitHub",
      "RemoteType": "github",
      "RemoteHost": "api.github.com",
      "RemoteUsername": "dvg-p4",
      "RemoteRepo": "innerPkg",
      "RemoteRef": "0.1.1",
      "RemoteSha": "04ac8b33e4bfbb59a793ed54bb031a8d7c63805c",
      "Hash": "6540c97eb55d6fc575127d2c51b547f4",
      "Requirements": []
    },
    "outerPkg": {
      "Package": "outerPkg",
      "Version": "0.1.0",
      "Source": "GitHub",
      "RemoteType": "github",
      "RemoteHost": "api.github.com",
      "RemoteUsername": "dvg-p4",
      "RemoteRepo": "outerPkg",
      "RemoteRef": "0.1.0",
      "RemoteSha": "4897d882ca3e3ecb5f908de9e5ffc9db6d6070b9",
      "Remotes": "dvg-p4/innerPkg",
      "Remotes.1": "dvg-p4/innerPkg",
      "Hash": "961db7b03ea39735e90cbe3d0d337ede",
      "Requirements": [
        "innerPkg"
      ]
    },

we see a nonsensical "Remotes.1"!

dvg-p4 commented 1 year ago

Further, if we do another fresh installation from the lockfile with "Remotes.1", we get into the same state as before, with outerPkg supposedly not matching its identical version:

> renv::status()
The following package(s) are out of sync:

   Package   Lockfile Version   Library Version
  outerPkg              0.1.0             0.1.0

Use `renv::snapshot()` to save the state of your library to the lockfile.
Use `renv::restore()` to restore your library from the lockfile.

> renv::snapshot()
The following package(s) will be updated in the lockfile:

# GitHub =============================
- outerPkg   [dvg-p4/outerPkg@0.1.0: unchanged]

Do you want to proceed? [y/N]: y
* Lockfile written to '~/scratchpad/renv_r3/renv.lock'.
> renv::status()
* The project is already synchronized with the lockfile.

The snapshot from this state leads to the addition of a "Remotes.2" line in renv.lock:

...
"outerPkg": {
      "Package": "outerPkg",
      "Version": "0.1.0",
      "Source": "GitHub",
      "RemoteType": "github",
      "RemoteHost": "api.github.com",
      "RemoteUsername": "dvg-p4",
      "RemoteRepo": "outerPkg",
      "RemoteRef": "0.1.0",
      "RemoteSha": "4897d882ca3e3ecb5f908de9e5ffc9db6d6070b9",
      "Remotes": "dvg-p4/innerPkg",
      "Remotes.1": "dvg-p4/innerPkg",
      "Remotes.2": "dvg-p4/innerPkg",
      "Hash": "59d61acd29d222d3fff16a28ffa80fb6",
      "Requirements": [
        "innerPkg"
      ]
    },
...
kevinushey commented 1 year ago

This is immensely helpful; thank you!

I think the core of the first issue you describe is that renv doesn't understand how to "mix" these two constraints:

Imports: 
    innerPkg (>= 0.1.0)
Remotes: 
    dvg-p4/innerPkg

In particular, when it sees the Remotes field in that form, it assumes that this means we should install innerPkg using whatever the "default" branch is (main or master). I believe you could tag the remote with the version directly as well, e.g.

Remotes: 
    dvg-p4/innerPkg@0.1.0

The documentation at https://cran.r-project.org/web/packages/remotes/vignettes/dependencies.html also suggests this is a supported mechanism for depending on a specific tag / version of a package on GitHub.

I'll see if I can reproduce / further understand the issue you describe with duplicated Remotes fields next.

kevinushey commented 1 year ago

I believe I've fixed the issue with duplicated Remotes fields here: https://github.com/rstudio/renv/commit/75fd390a49eef4cc0cb5221c56114e1e8ae7f384

dvg-p4 commented 11 months ago

A note for future passers-by: I still occasionally see duplicated Remotes fields, usually in situations involving a project that was created with an earlier version of renv and upgraded to one after this fix. Unfortunately I've been unable to get a good reprex for this; fortunately I've found the following steps to work around the problem: