r-lib / pak

A fresh approach to package installation
https://pak.r-lib.org
646 stars 57 forks source link

GitHub Actions CI Issue: dependent package "is not a valid binary, it does not contain 'Hmisc/Meta/package.rds'" #467

Open billdenney opened 1 year ago

billdenney commented 1 year ago

When using GitHub Actions, I get the following error that appears to originate from pak. It may be related to R version 4.3 since it consistently happens for the Ubuntu-devel version.

  Error: 
  ! error in pak subprocess
  Caused by error in `verify_extracted_package(filename, pkg_cache)`:
  ! '/tmp/RtmprnnlE9/file13a638b1eebc/src/contrib/x86_64-pc-linux-gnu-ubuntu-22.04/4.3/Hmisc_5.0-0.tar.gz' is not a valid binary, it does not contain 'Hmisc/Meta/package.rds'.
  ---
  Backtrace:
  1. pak::lockfile_install(".github/pkg.lock")
  2. pak:::remote(function(...) { …
  3. err$throw(res$error)

The error is occurring here:

https://github.com/nlmixr2/rxode2/actions/runs/4355986650/jobs/7613357209#step:6:6313

gaborcsardi commented 1 year ago

It is a bug in pak. A workaround is to use the stable pak version, if you can do that. Will be fixed soon.

gaborcsardi commented 1 year ago

Probably the same in another repo: https://github.com/tidymodels/recipes/actions/runs/4348497699/jobs/7614837809

This can happen if pak expects a binary package, but RSPM sends a source package. So this is definitely a bug.

OTOH, for the packages that fail RSPM sends binaries, so something else must be going on as well. pkgdepends/pkgcache will have to set the User-Agent header appropriately when downloading packages from RSPM, but we do set it on GHA, so IDK why it is happening there, and I can't reproduce locally.

Adafede commented 1 year ago

In case, also happening here, if of any help:

https://github.com/taxonomicallyinformedannotation/tima-r/actions/runs/4355136215/jobs/7611345869#step:5:5218

gaborcsardi commented 1 year ago

This should be fixed now, I reverted a commit in pkgcache: https://github.com/r-lib/pkgcache/commit/c9ddff1e8fef1131836da9f4fb39ecdc64fc6bb2

I still don't know what the bug is, though, but at least this Dockerfile, using the previous devel version of pak reproduces it:

# -*- mode: Dockerfile -*-

FROM ubuntu:22.04

RUN apt-get update && \
    apt-get install -y curl && \
    curl -LO https://cdn.posit.co/r/ubuntu-2204/pkgs/r-devel_1_amd64.deb && \
    apt install -y ./r-devel_1_amd64.deb && \
    rm r-devel*.deb

RUN ln -s /opt/R/devel/bin/R /usr/local/bin/R && \
    ln -s /opt/R/devel/bin/Rscript /usr/local/bin/Rscript

RUN echo 'options( \
    repos = c(RSPM = "https://packagemanager.posit.co/cran/__linux__/jammy/latest", CRAN = "https://cloud.r-project.org"),\
    HTTPUserAgent="R/4.2.2 R (4.2.2 x86_64-pc-linux-gnu x86_64 linux-gnu) on GitHub Actions"\
    )' \
    >> $HOME/.Rprofile

RUN apt-get update && \
    apt-get install -y git && \
    git clone --depth 1 https://github.com/r-lib/pkgcache

RUN curl -L -H 'Authorization: Bearer QQ==' -o pak.tar.gz \
    https://ghcr.io/v2/r-lib/pak/blobs/sha256:ecafed8beab831d350856a06a0e68b4101014ea96783f8f5d1eb9a90183bb31a && \
    R CMD INSTALL pak.tar.gz

RUN cd pkgcache && \
    Rscript -e 'pak::lockfile_create(c("deps::.", "any::rcmdcheck", "any::sessioninfo"), dependencies = "all")'

RUN cd pkgcache && \
    cat pkg.lock

RUN cd pkgcache && \
    Rscript -e 'pak::lockfile_install()'
Adafede commented 1 year ago

@gaborcsardi can confirm it fixed the issue for me! 😊

gaborcsardi commented 1 year ago

It turns out that pak does not actually use binaries from RSPM on R-devel, because the pak subprocess does not read .Rprofile, so the HTTPUserAgent option is never set in the subprocess.

On the one hand, it is weird that we never actually noticed this. OTOH it would explain why we never saw errors when trying to use the release binaries for devel builds.

So now this causes an error because now the install plan is to download binaries, but then we get source packages, and this is not handled.

nbenn commented 1 year ago

I'm running into this issue too. Right now PPM for Jammy/latest seems to be serving MASS as source package, which causes my CI jobs to fail. The suggestion above, to use stable does not help, as I am already doing so (i.e. installing pak from https://r-lib.github.io/p/pak/stable/). Should pak be able to handle this situation gracefully, or is this considered the fault of PPM for not serving binaries? The workaround for now is to use a PPM snapshot that is fine.

gaborcsardi commented 1 year ago

@nbenn can you link to the workflow and the failing build?

nbenn commented 1 year ago

Unfortunately it's a private repo and it's not a Gh actions workflow (I'm using drone). I might be able to put together the necessary pieces to reproduce though if you want to try and have a look. I'll go ahead and describe the setup:

It's a rocker/verse:4.2 image and I'm using a repos setting like

c(CRAN = "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest")

I install pak as

install.packages("pak", repos = "https://r-lib.github.io/p/pak/stable/")

And in my lockfile I then get (among other entries)

{
  "ref": "MASS",
  "package": "MASS",
  "version": "7.3-59",
  "type": "standard",
  "direct": false,
  "binary": true,
  "dependencies": [],
  "vignettes": false,
  "needscompilation": false,
  "metadata": {
    "RemoteType": "standard",
    "RemotePkgRef": "MASS",
    "RemoteRef": "MASS",
    "RemoteRepos": "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest",
    "RemotePkgPlatform": "x86_64-pc-linux-gnu-ubuntu-22.04",
    "RemoteSha": "7.3-59"
      },
  "sources": ["https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/MASS_7.3-59.tar.gz", "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz"],
  "target": "src/contrib/x86_64-pc-linux-gnu-ubuntu-22.04/4.2/MASS_7.3-59.tar.gz",
  "platform": "x86_64-pc-linux-gnu-ubuntu-22.04",
  "rversion": "4.2",
  "directpkg": false,
  "license": "GPL-2 | GPL-3",
  "dep_types": ["Depends", "Imports", "LinkingTo"],
  "params": [],
  "install_args": "",
  "repotype": "cran"
}

The PPM web UI currently says

Screenshot 2023-04-25 at 17 10 29

Happy to share more if you're interested.

gaborcsardi commented 1 year ago

Seems like RSPM is sending a binary package to me:

> pak::pkg_install("MASS?reinstall")
> Will install 1 package.
> Will download 1 package with unknown size.
+ MASS   7.3-59 [dl]
i Getting 1 pkg with unknown size
v Got MASS 7.3-59 (x86_64-pc-linux-gnu-ubuntu-22.04) (1.12 MB)
v Downloaded 1 package (1.12 MB)in 4.9s
v Installed MASS 7.3-59  (270ms)
v 1 pkg: added 1, dld 1 (1.12 MB) [8.2s]
> .Last.value[c("sources", "platform")]
                                                                                     sources
1 https://packagemanager.posit.co/cran/__linux__/jammy/latest/src/contrib/MASS_7.3-59.tar.gz
                          platform
1 x86_64-pc-linux-gnu-ubuntu-22.04
> getRversion()
[1] '4.2.0'

How are your repos set up? What is the output of pak::repo_get()? E.g.

> pak::repo_get()
           name                                                         url
1          RSPM https://packagemanager.posit.co/cran/__linux__/jammy/latest
2          CRAN                                 https://cloud.r-project.org
3      BioCsoft                 https://bioconductor.org/packages/3.16/bioc
4       BioCann      https://bioconductor.org/packages/3.16/data/annotation
5       BioCexp      https://bioconductor.org/packages/3.16/data/experiment
6 BioCworkflows            https://bioconductor.org/packages/3.16/workflows
      type r_version bioc_version
1 cranlike         *         <NA>
2     cran         *         <NA>
3     bioc     4.2.0         3.16
4     bioc     4.2.0         3.16
5     bioc     4.2.0         3.16
6     bioc     4.2.0         3.16
nbenn commented 1 year ago

Are you sure, you're being served a binary release? If I download the file at https://packagemanager.posit.co/cran/__linux__/jammy/latest/src/contrib/MASS_7.3-59.tar.gz, this looks very much like what I get from CRAN as source release, no? Also, according to the PPM web UI, no binary is currently available for jammy/4.2 (see screenshot above). Or am I misunderstanding something here?

As requested:

> pak::repo_get()
           name
1          CRAN
2           efv
3      BioCsoft
4       BioCann
5       BioCexp
6 BioCworkflows
                                                                 url     type
1     https://packagemanager.rstudio.com/cran/__linux__/jammy/latest     cran
2                                     http://***internal_repo_url*** cranlike
3                        https://bioconductor.org/packages/3.16/bioc     bioc
4             https://bioconductor.org/packages/3.16/data/annotation     bioc
5             https://bioconductor.org/packages/3.16/data/experiment     bioc
6                   https://bioconductor.org/packages/3.16/workflows     bioc
  r_version bioc_version
1         *         <NA>
2         *         <NA>
3     4.2.3         3.16
4     4.2.3         3.16
5     4.2.3         3.16
6     4.2.3         3.16
gaborcsardi commented 1 year ago

Yeah, it is a binary package:

> dl <- pak::pkg_download("MASS")
i Getting 2 pkgs (1.03 MB) and 2 pkgs with unknown sizes
v Got MASS 7.3-59 (source) (515.84 kB)
v Got MASS 7.3-59 (source) (515.84 kB)
v Got MASS 7.3-59 (x86_64-pc-linux-gnu-ubuntu-22.04) (1.12 MB)
v Got MASS 7.3-59 (x86_64-pc-linux-gnu-ubuntu-22.04) (1.12 MB)
v Downloaded 4 packages (3.27 MB)in 7.1s
> dl$sources
[[1]]
[1] "https://packagemanager.posit.co/cran/__linux__/jammy/latest/src/contrib/4.4.0/Recommended/MASS_7.3-59.tar.gz"

[[2]]
[1] "https://packagemanager.posit.co/cran/__linux__/jammy/latest/src/contrib/MASS_7.3-59.tar.gz"

[[3]]
[1] "https://cloud.r-project.org/src/contrib/MASS_7.3-59.tar.gz"
[2] "https://cloud.r-project.org/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz"

[[4]]
[1] "https://cloud.r-project.org/src/contrib/4.4.0/Recommended/MASS_7.3-59.tar.gz"
[2] "https://cloud.r-project.org/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz"
> dl$fulltarget[2]
[1] "./src/contrib/x86_64-pc-linux-gnu-ubuntu-22.04/4.2/MASS_7.3-59.tar.gz"
> untar(dl$fulltarget[2], list = TRUE)[1:10]
 [1] "MASS/CITATION"          "MASS/DESCRIPTION"       "MASS/INDEX"
 [4] "MASS/Meta/"             "MASS/Meta/Rd.rds"       "MASS/Meta/data.rds"
 [7] "MASS/Meta/features.rds" "MASS/Meta/hsearch.rds"  "MASS/Meta/links.rds"
[10] "MASS/Meta/nsInfo.rds"
nbenn commented 1 year ago

I'm sorry if all of this comes down to me somehow misunderstanding things, but for the URL you have in dl$sources[[2]], which I assume corresponds to what you end up with in dl$fulltarget[2], I get

url <- "https://packagemanager.posit.co/cran/__linux__/jammy/latest/src/contrib/MASS_7.3-59.tar.gz"

tmp <- tempfile()
download.file(url, tmp)
untar(tmp, list = TRUE)[1:10]
#>  [1] "MASS/"                "MASS/NAMESPACE"       "MASS/LICENCE.note"
#>  [4] "MASS/ChangeLog"       "MASS/data/"           "MASS/data/Boston.rda"
#>  [7] "MASS/data/immer.rda"  "MASS/data/geyser.rda" "MASS/data/npr1.rda"
#> [10] "MASS/data/phones.rda"
unlink(tmp)

Which I find unsurprising given the src/contrib path.

What am I missing here? Are the two of us getting different files?

gaborcsardi commented 1 year ago

Yes, RSPM sends different files depending on your User Agent header: https://packagemanager.posit.co/__docs__/admin/serving-binaries/#binary-user-agents

> getOption("HTTPUserAgent")
[1] "R/4.2.0 R (4.2.0 x86_64-pc-linux-gnu x86_64 linux-gnu)"
nbenn commented 1 year ago

Many thanks for clarifying this bit & sorry for the detour with my past ~3 msgs. I was running some of that under macOS.

I can still reproduce the problem though:

  1. Start up image rocker/verse:4.2

  2. Just to make sure

    > source("https://packagemanager.posit.co/__docs__/admin/check-user-agent.R")
    #> R installation path: /usr/local/lib/R
    #> R version: R version 4.2.3 (2023-03-15)
    #> OS version: Ubuntu 22.04.2 LTS
    #> HTTPUserAgent: R/4.2.3 R (4.2.3 x86_64-pc-linux-gnu x86_64 linux-gnu)
    #> Download method: libcurl
    #> Download extra args:
    #> 
    #> ----------------------------
    #> 
    #> Success! Your user agent is correctly configured.
  3. Set repos

    > options(
    +   repos = c(
    +     CRAN = "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest"
    +   )
    + )
  4. Install pak

    > install.packages("pak", repos = "https://r-lib.github.io/p/pak/stable/")
    #> Installing package into ‘/usr/local/lib/R/site-library’
    #> (as ‘lib’ is unspecified)
    #> trying URL 'https://r-lib.github.io/p/pak/stable/src/contrib/../../linux/x86_64/pak_0.5.0_R-4-2_x86_64-linux.tar.gz'
    #> Content type 'application/gzip' length 8116284 bytes (7.7 MB)
    #> ==================================================
    #> downloaded 7.7 MB
    #> 
    #> * installing *binary* package ‘pak’ ...
    #> * DONE (pak)
    #> 
    #> The downloaded source packages are in
    #>         ‘/tmp/RtmpFGKAD3/downloaded_packages’
  5. Download MASS

    > dl <- pak::pkg_download("MASS")
    #> ✔ Updated metadata database: 2.86 MB in 7 files.
    #> ✔ Updating metadata database ... done
    #> ℹ Getting 2 pkgs with unknown sizes
    #> ✔ Got MASS 7.3-59 (x86_64-pc-linux-gnu-ubuntu-22.04) (516.09 kB)
    #> ✔ Got MASS 7.3-59 (x86_64-pc-linux-gnu-ubuntu-22.04) (516.09 kB)
    #> ✔ Downloaded 2 packages (1.03 MB)in 1.7s
  6. Check downloads

    > dl$sources
    #> [[1]]
    #> [1] "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/4.4.0/Recommended/MASS_7.3-59.tar.gz"
    #> [2] "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz"
    #> 
    #> [[2]]
    #> [1] "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/MASS_7.3-59.tar.gz"
    #> [2] "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz"
    > untar(dl$fulltarget[1], list = TRUE)[1:10]
    #>  [1] "MASS/"                "MASS/NAMESPACE"       "MASS/LICENCE.note"
    #>  [4] "MASS/ChangeLog"       "MASS/data/"           "MASS/data/Boston.rda"
    #>  [7] "MASS/data/immer.rda"  "MASS/data/geyser.rda" "MASS/data/npr1.rda"
    #> [10] "MASS/data/phones.rda"
    > untar(dl$fulltarget[2], list = TRUE)[1:10]
    #>  [1] "MASS/"                "MASS/NAMESPACE"       "MASS/LICENCE.note"
    #>  [4] "MASS/ChangeLog"       "MASS/data/"           "MASS/data/Boston.rda"
    #>  [7] "MASS/data/immer.rda"  "MASS/data/geyser.rda" "MASS/data/npr1.rda"
    #> [10] "MASS/data/phones.rda"
gaborcsardi commented 1 year ago

This is probably an edge case in pak. I don't see it if I set up my repos like this:

> getOption("repos")
                                                         RSPM
"https://packagemanager.posit.co/cran/__linux__/jammy/latest"
                                                         CRAN
                                "https://cloud.r-project.org"

Hopefully that works as a workaround for you.

As for the bug itself, I think it happens because RSPM has two versions of MASS, and only has a binary for one version. pak sees that RSPM has binaries for MASS, and assumes that both builds are binary.

nbenn commented 1 year ago

Yes, I can confirm that the problem goes away when setting repos as

options(
  repos = c(
    RSPM = "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest",
    CRAN = "https://cloud.r-project.org"
  )
)

I then get the same results as you do.

In terms of workarounds, I'm currently using a PPM snapshot from before the last update to MASS (sometime last week). Which works fine for me for now.

In terms of general advice, if you consider this an edge case, would you suggest something like "always set up a proper CRAN repo alongside PPM for pak to function properly"?

At any rate, thanks for taking the time to discuss (and for filling me in on the importance of user agent headers for interacting with PPM).

gaborcsardi commented 1 year ago

In terms of general advice, if you consider this an edge case, would you suggest something like "always set up a proper CRAN repo alongside PPM for pak to function properly"?

Oh, an edge case is still a bug, and I'll fix it asap.

nbenn commented 1 year ago

Ah, sorry, misunderstood you there. Thanks!

pat-s commented 1 year ago

In terms of general advice, if you consider this an edge case, would you suggest something like "always set up a proper CRAN repo alongside PPM for pak to function properly"?

AFAIU the issue is not due to the lack of a "proper" CRAN repo - PPM also provides the same sources as the CRAN repo besides the binaries - but that {pak} errors after having received the binary, i.e. in its post-processing step. The only scenario and reason in which you would not see this issue with https://cloud.r-project.org/ as a single repo would be that {pak} would just directly get the source instead of the binary (because CRAN does not provide Linux binaries) and hence can't fail at the "is not a valid binary" step.

gaborcsardi commented 1 year ago

There are about four interacting issues here.

  1. RSPM replies to both the proper and the Archive URLs.

  2. pak tries both URLs in parallel, because for a "normal" CRAN repo, only one of them will work. Whichever file arrives first will be used as the result. If you are lucky the binary arrives first, if you are unlucky the source package. RSPM usually sends the same file for both, so apart from the potential traffic waste, there are no further issues:

    
    > download.file("https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/Archive/cli/cli_3.6.1.tar.gz", "cli.tar.gz")
    trying URL 'https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/Archive/cli/cli_3.6.1.tar.gz'
    Content type 'binary/octet-stream' length 1260295 bytes (1.2 MB)
    ==================================================
    downloaded 1.2 MB

untar("cli.tar.gz", list = TRUE)[1:10] [1] "cli/DESCRIPTION" "cli/INDEX" "cli/LICENSE" [4] "cli/Meta/" "cli/Meta/Rd.rds" "cli/Meta/features.rds" [7] "cli/Meta/hsearch.rds" "cli/Meta/links.rds" "cli/Meta/nsInfo.rds" [10] "cli/Meta/package.rds"

  1. However, MASS is different. While RSPM sends a binary package for the "proper" MASS URLS, it sends a source package for MASS archive URL:
> download.file("https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/4.4.0/Recommended/MASS_7.3-59.tar.gz", "MASS4.tar.gz")
trying URL 'https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/4.4.0/Recommended/MASS_7.3-59.tar.gz'
Content type 'binary/octet-stream' length 1116983 bytes (1.1 MB)
==================================================
downloaded 1.1 MB

> untar("MASS4.tar.gz", list = TRUE)[1:10]
 [1] "MASS/CITATION"          "MASS/DESCRIPTION"       "MASS/INDEX"
 [4] "MASS/Meta/"             "MASS/Meta/Rd.rds"       "MASS/Meta/data.rds"
 [7] "MASS/Meta/features.rds" "MASS/Meta/hsearch.rds"  "MASS/Meta/links.rds"
[10] "MASS/Meta/nsInfo.rds"
> download.file("https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz", "MASS.tar.gz")
trying URL 'https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz'
Content type 'binary/octet-stream' length 516089 bytes (503 KB)
==================================================
downloaded 503 KB

> untar("MASS_7.3-59.tar.gz", list = TRUE)[1:10]
 [1] "MASS/"                "MASS/NAMESPACE"       "MASS/LICENCE.note"
 [4] "MASS/ChangeLog"       "MASS/data/"           "MASS/data/Boston.rda"
 [7] "MASS/data/immer.rda"  "MASS/data/geyser.rda" "MASS/data/npr1.rda"
[10] "MASS/data/phones.rda"
  1. pak cannot handle the case when it receives a source package instead of the expected binary package.
gaborcsardi commented 1 year ago

To fix this, first I need to fix 4. and handle the case when RSPM replies with an unexpected source package.

Then we can think about avoiding the duplicate downloads.

I also reported the issue of sending a source package instead of a binary to the RSPM team.

gaborcsardi commented 1 year ago

This will be eventually fixed in PPM: https://github.com/rstudio/package-manager/issues/10471 (private repo, sorry).

I'll also add a workaround in a minute.

gaborcsardi commented 1 year ago

OK, there is a workaround now, in the pkgcache package that pak uses internally. It will be in tomorrow's nightly pak devel build: https://pak.r-lib.org/dev/reference/install.html#nightly-builds

pat-s commented 1 year ago

@gaborcsardi Any ETA when this will make it into "stable"? I have a bunch of WFs running on "stable" and would like to avoid having to point all of them to "devel" and then back again.

Here's one example for which {KernSmooth} is the issue instead of {MASS}: https://github.com/mlr-org/mlr3spatiotempcv/actions/runs/5046852603/jobs/9052924843?pr=223#step:11:602

mthomas-ketchbrook commented 5 months ago

I'm running into the same issue with this GHA, specifically related to the {withr} package: https://github.com/ketchbrookanalytics/fcall/actions/runs/7560235802/job/20585804754

Is this perhaps due to the fact that {withr} v3.0.0 was released just a few hours ago? Apologies if this is an issue with my understanding when/how often binaries are made available.

gaborcsardi commented 5 months ago

@mthomas-ketchbrook I think that's a different issue that manifests in a similar way, and it happens because of a PPM (formerly known as RSPM) bug. A workaround is to turn off PPM until it is fixed.

florisvdh commented 5 months ago

I think that's a different issue that manifests in a similar way.

I also ran into this PPM-{withr} problem; 3 hours ago PPM still provided {withr} 2.5.2 = no problem. I can confirm that neither the options(repos = c(RSPM = ..., CRAN = ...) (florisvdh/n2khabmon-fork@b9b0b72) nor the pak-version: devel workarounds (florisvdh/n2khabmon-fork@2f28f33) mentioned earlier in this thread solve this one.

florisvdh commented 5 months ago

The PPM-{withr} problem reported above in https://github.com/r-lib/pak/issues/467#issuecomment-1896442984 has been gone (example log).

See https://fosstodon.org/@jvroberts/111773434263658017