r-lib / pak

A fresh approach to package installation
https://pak.r-lib.org
639 stars 56 forks source link

Installation on custom envs fails due to PATH on .onLoad #619

Closed latot closed 2 months ago

latot commented 2 months ago

Hi!, this really took long time to find, I was building/installing some libraries while some dependencies was enable from conda/reticulate... some times it worked, others not, really hard to get what was happening!

So here the issue, when we load/call pak it will save the actual PATH, which causes if we add a new binary path for a particular installation it will fails because will use the PATH from load.

There will be two sections, there is some minor differences, the windows and linux one, the linux one is a lot minimal than the windows one, just because was developed before requested by @gaborcsardi

Linux

Preparation

First we need to prepare a file, this will be called on file installation to test PATH.

mkdir /tmp/bin
echo 'echo "Hello World"' > /tmp/bin/example
chmod +x /tmp/bin/example

Repository to test

I have prepared this one: https://github.com/latot/bugs/tree/pak_path2

The tricky part is in the configuration file: https://github.com/latot/bugs/blob/f5e0ca722c78d5872033f5bb860bdf9480272cac/configure#L1

Which will call the binary example on install, if PATH is fine, then it will works, if not we will get the error.

This will fails!

# Install something, anything
pak::pkg_install("https://github.com/r-lib/pak")

# Change PATH
Sys.setenv(PATH = paste("/tmp/bin", Sys.getenv("PATH"), sep = ":"))

# This one will depends on example file
pak::pkg_install("https://github.com/latot/bugs/tree/pak_path2")

The main difference with windows, would be instead of install something, it will also works with library(pak).

This will works!

# Change PATH
Sys.setenv(PATH = paste("/tmp/bin", Sys.getenv("PATH"), sep = ":"))

# This one will depends on example file
pak::pkg_install("https://github.com/latot/bugs/tree/pak_path2")

Windows

Here some steps as example this library to be built needs the Rust compiler, we will enable it from conda:

install.packages("pak")
# With this we can see the error message
pak::pkg_install("https://github.com/r-lib/pak", upgrade = TRUE)

Some helper functions to handle PATH

conda_windows_path <- function(name) {
  envs <- dplyr::filter(reticulate::conda_list(), .data$name == .env$name)
  if (nrow(envs) == 0) {
    stop("Env name of conda not found")
  }
  if (nrow(envs) > 1) {
    print(paste0("Actual env name: ", name))
    print(envs)
    stop("More than one env with that name, should never happens!")
  }
  fs::path(dirname(envs$python[[1]]), "Library", "bin")
}
update_path <- function(name) {
  if (TRUE %in% (Sys.info() == "Windows")){
    conda_bin_path <- conda_windows_path(name)
  } else {
    stop("Right now only support Windows")
  }
  old_path <- Sys.getenv("PATH")
  Sys.setenv(PATH = paste(
    conda_bin_path,
    old_path,
    sep = ";"
  ))
}

packages <- "https://github.com/latot/bugs/tree/pak_path"

env_conda <- "test"

# Install miniconda
reticulate::install_miniconda()

# Install rust compiler (cargo)
reticulate::conda_install(env_conda, "rust-gnu")

old_path <- Sys.getenv("PATH")

Here the tricky part, in this moment if we follow correctly each instruction, we have not executed nor loaded nothing from pak.

The next statements will works:

# Update PATH with new binaries
update_path(env_conda)
# This will works fine
pak::pkg_install(packages, upgrade = TRUE)

While this ones will fail:

# Seems pak here will save the actual PATH, and is the one that will be used for installations
library(pak)
# Update PATH with new binaries, but will not be considered
update_path(env_conda)
# This will fails with "cargo not found"
pak::pkg_install(packages, upgrade = TRUE)

So would be good for at least installations check the new PATH if has changed, it can affect some installations.

A workaround, before install something run:

unloadNamespace("pak")

Thx!

gaborcsardi commented 2 months ago

pak starts a worker process which inherits the environment from its parent process. If you change the environment in the parent process, the worker process will not know about that. This will be fixed when pak will stop doing installations in a worker process.

latot commented 2 months ago

@gaborcsardi sorry didn't understand it very well, thinking in the examples, we are not performing any installation when we change PATH, the only difference is when we load the library, only load, if is loaded before the PATH change, it will fails.

gaborcsardi commented 2 months ago

That would be pretty weird, pak does nothing with the PATH, except that it adds an extra directory to it during installation. I created a package that records the path during installation, and this is what I see:

library(pak)
foo <- tempfile("foobar")
foo
#> [1] "C:\\Users\\csard\\AppData\\Local\\Temp\\Rtmp0EsR4h\\foobar56084b83304e"

dir.create(foo)
Sys.setenv(PATH = paste0(foo, ";", Sys.getenv("PATH")))
pak::pkg_install("gaborcsardi/PATH?reinstall&nocache")
#> → Will install 1 package.
#> → Will download 1 package with unknown size.
#> + PATH   0.0.0.9000 [bld][cmp][dl] (GitHub: b59966c)
#> ℹ Getting 1 pkg with unknown size
#> ✔ Got PATH 0.0.0.9000 (source) (1.21 kB)
#> ✔ Downloaded 1 package (1.21 kB) in 742ms
#> ℹ Packaging PATH 0.0.0.9000
#> ✔ Packaged PATH 0.0.0.9000 (1.4s)
#> ℹ Building PATH 0.0.0.9000
#> ✔ Built PATH 0.0.0.9000 (1.2s)
#> ✔ Installed PATH 0.0.0.9000 (github::gaborcsardi/PATH@b59966c) (105ms)
#> ✔ 1 pkg: added 1, dld 1 (NA B) [4.6s]
> PATH:::path
[1] "C:\\rtools43/x86_64-w64-mingw32.static.posix/bin;C:\\rtools43/usr/bin;C:\\rtools43/x86_64-w64-mingw32.static.posix/bin;C:\\rtools43/usr/bin;C:\\PROGRA~1\\R\\R-43~1.3\\bin/x64;C:\\rtools43/x86_64-w64-mingw32.static.posix/bin;C:\\rtools43/usr/bin;C:\\Users\\csard\\AppData\\Local\\R\\win-library\\4.3\\pak\\library\\zip\\bin\\x64;C:\\rtools43\\usr\\bin;C:\\rtools43\\x86_64-w64-mingw32.static.posix\\bin;C:\\rtools43\\usr\\bin;C:\\Users\\csard\\AppData\\Local\\Temp\\Rtmp0EsR4h\\foobar56084b83304e;C:\\rtools43\\x86_64-w64-mingw32.static.posix\\bin;C:\\rtools43\\usr\\bin;C:\\Program Files (x86)\\C

pak adds the ...\zip\bin part, and R (or possibly r-lib/rig) adds rtools a couple of times, but the additional foobar... directory is still there during installation.

So I don't think there is much I can do here.

latot commented 2 months ago

well... is pretty weird this issue...., you can check the example I post above, is step be step to get this problem...

I thought is on .onLoad, because the only way to show this bug, is loading or executing something from pak, how it does not found the binaries the only idea I have is PATH.

There is something to remember, this is like internally, if you check PATH, before and after the installation command, will be fine, this only happens inside the installation command... which is weird. My assumption is that .onLoad pak saved the env, and uses it when install, ignoring the changes of env after .onLoad.....

gaborcsardi commented 2 months ago

you can check the example I post above,

I am not going to do that, because I don't have a windows container at hand and I won't install conda. I am afraid that you'd need to create a more minimal example, if you are convinced that pak drops the PATH somehow.

AFAICT pak does nothing with the PATH, except that the subprocess will inherit it, and the subprocess starts when pak needs it. That way you'd indeed lose local PATH updates in the main process.

latot commented 2 months ago

@gaborcsardi Hi, I have updated the description of the issue, I have added a linux version, this time is a lot minimal! as you requested. I was pretty sad yesterday, took me a lot of time find why this happened... two weeks looking on this... and trying to write a good minimal example.

From what you says, if pak has a subprocess that starts for example, with a installation, and that process keeps running, is used for further installations and keeps the same env vars, it will miss any change after the first installation, libraries, binaries... which is far from ideal.

R also does not allow us some type of custom installations, like after/post scripts to handle things, so some installations would need to prepare some envs to work, we also need to isolate that envs instead of accumulate one over other, to do some installation tests is also useful change env vars.

Maybe... a way to update the env vars of the subprocess with the main process could do the trick.

gaborcsardi commented 2 months ago

From what you says, if pak has a subprocess that starts for example, with a installation, and that process keeps running, is used for further installations and keeps the same env vars, it will miss any change after the first installation, libraries, binaries... which is far from ideal.

Yes, that does happen, that was my very first comment.

FWIW if you want to set the PATH for R, the best way to do that is in the system or the user environment file. See ?Startup. In a nutshell, put

PATH=...

in ~/.Renviron or R_HOME/etc/x64/Renviron.site (on Windows).

Maybe... a way to update the env vars of the subprocess with the main process could do the trick.

We could do that I suppose.

latot commented 2 months ago

Thx for the fix! @gaborcsardi just to know, is possible to get the full env to the child? with some custom env installations PATH is not the only var that we can touch, and some apps can depends on extra variables.

Is too hard do this? or there is more tech issues that complicate that?

gaborcsardi commented 2 months ago

It is not hard to do that, that it does not work. Some of those env vars are set in the subprocess for a purpose, by use, by R or by the OS. We cannot just overwrite all of them with the ones in the parent process.

Again, your best bet is to set up the environment before starting R. It is not just pak that does calculations in subprocesses.

latot commented 2 months ago

Sadly that can not be done easily, to test packages we can change the envs per package and similar, so there is no "static" env to be added, because some of them are generated in the code....