Open billdenney opened 1 year ago
It is a bug in pak. A workaround is to use the stable
pak version, if you can do that. Will be fixed soon.
Probably the same in another repo: https://github.com/tidymodels/recipes/actions/runs/4348497699/jobs/7614837809
This can happen if pak expects a binary package, but RSPM sends a source package. So this is definitely a bug.
OTOH, for the packages that fail RSPM sends binaries, so something else must be going on as well. pkgdepends/pkgcache will have to set the User-Agent
header appropriately when downloading packages from RSPM, but we do set it on GHA, so IDK why it is happening there, and I can't reproduce locally.
In case, also happening here, if of any help:
This should be fixed now, I reverted a commit in pkgcache: https://github.com/r-lib/pkgcache/commit/c9ddff1e8fef1131836da9f4fb39ecdc64fc6bb2
I still don't know what the bug is, though, but at least this Dockerfile, using the previous devel version of pak reproduces it:
# -*- mode: Dockerfile -*-
FROM ubuntu:22.04
RUN apt-get update && \
apt-get install -y curl && \
curl -LO https://cdn.posit.co/r/ubuntu-2204/pkgs/r-devel_1_amd64.deb && \
apt install -y ./r-devel_1_amd64.deb && \
rm r-devel*.deb
RUN ln -s /opt/R/devel/bin/R /usr/local/bin/R && \
ln -s /opt/R/devel/bin/Rscript /usr/local/bin/Rscript
RUN echo 'options( \
repos = c(RSPM = "https://packagemanager.posit.co/cran/__linux__/jammy/latest", CRAN = "https://cloud.r-project.org"),\
HTTPUserAgent="R/4.2.2 R (4.2.2 x86_64-pc-linux-gnu x86_64 linux-gnu) on GitHub Actions"\
)' \
>> $HOME/.Rprofile
RUN apt-get update && \
apt-get install -y git && \
git clone --depth 1 https://github.com/r-lib/pkgcache
RUN curl -L -H 'Authorization: Bearer QQ==' -o pak.tar.gz \
https://ghcr.io/v2/r-lib/pak/blobs/sha256:ecafed8beab831d350856a06a0e68b4101014ea96783f8f5d1eb9a90183bb31a && \
R CMD INSTALL pak.tar.gz
RUN cd pkgcache && \
Rscript -e 'pak::lockfile_create(c("deps::.", "any::rcmdcheck", "any::sessioninfo"), dependencies = "all")'
RUN cd pkgcache && \
cat pkg.lock
RUN cd pkgcache && \
Rscript -e 'pak::lockfile_install()'
@gaborcsardi can confirm it fixed the issue for me! 😊
It turns out that pak does not actually use binaries from RSPM on R-devel, because the pak subprocess does not read .Rprofile
, so the HTTPUserAgent
option is never set in the subprocess.
On the one hand, it is weird that we never actually noticed this. OTOH it would explain why we never saw errors when trying to use the release binaries for devel builds.
So now this causes an error because now the install plan is to download binaries, but then we get source packages, and this is not handled.
I'm running into this issue too. Right now PPM for Jammy/latest seems to be serving MASS as source package, which causes my CI jobs to fail. The suggestion above, to use stable
does not help, as I am already doing so (i.e. installing pak from https://r-lib.github.io/p/pak/stable/). Should pak be able to handle this situation gracefully, or is this considered the fault of PPM for not serving binaries? The workaround for now is to use a PPM snapshot that is fine.
@nbenn can you link to the workflow and the failing build?
Unfortunately it's a private repo and it's not a Gh actions workflow (I'm using drone). I might be able to put together the necessary pieces to reproduce though if you want to try and have a look. I'll go ahead and describe the setup:
It's a rocker/verse:4.2
image and I'm using a repos setting like
c(CRAN = "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest")
I install pak as
install.packages("pak", repos = "https://r-lib.github.io/p/pak/stable/")
And in my lockfile I then get (among other entries)
{
"ref": "MASS",
"package": "MASS",
"version": "7.3-59",
"type": "standard",
"direct": false,
"binary": true,
"dependencies": [],
"vignettes": false,
"needscompilation": false,
"metadata": {
"RemoteType": "standard",
"RemotePkgRef": "MASS",
"RemoteRef": "MASS",
"RemoteRepos": "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest",
"RemotePkgPlatform": "x86_64-pc-linux-gnu-ubuntu-22.04",
"RemoteSha": "7.3-59"
},
"sources": ["https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/MASS_7.3-59.tar.gz", "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz"],
"target": "src/contrib/x86_64-pc-linux-gnu-ubuntu-22.04/4.2/MASS_7.3-59.tar.gz",
"platform": "x86_64-pc-linux-gnu-ubuntu-22.04",
"rversion": "4.2",
"directpkg": false,
"license": "GPL-2 | GPL-3",
"dep_types": ["Depends", "Imports", "LinkingTo"],
"params": [],
"install_args": "",
"repotype": "cran"
}
The PPM web UI currently says
Happy to share more if you're interested.
Seems like RSPM is sending a binary package to me:
> pak::pkg_install("MASS?reinstall")
> Will install 1 package.
> Will download 1 package with unknown size.
+ MASS 7.3-59 [dl]
i Getting 1 pkg with unknown size
v Got MASS 7.3-59 (x86_64-pc-linux-gnu-ubuntu-22.04) (1.12 MB)
v Downloaded 1 package (1.12 MB)in 4.9s
v Installed MASS 7.3-59 (270ms)
v 1 pkg: added 1, dld 1 (1.12 MB) [8.2s]
> .Last.value[c("sources", "platform")]
sources
1 https://packagemanager.posit.co/cran/__linux__/jammy/latest/src/contrib/MASS_7.3-59.tar.gz
platform
1 x86_64-pc-linux-gnu-ubuntu-22.04
> getRversion()
[1] '4.2.0'
How are your repos set up? What is the output of pak::repo_get()
? E.g.
> pak::repo_get()
name url
1 RSPM https://packagemanager.posit.co/cran/__linux__/jammy/latest
2 CRAN https://cloud.r-project.org
3 BioCsoft https://bioconductor.org/packages/3.16/bioc
4 BioCann https://bioconductor.org/packages/3.16/data/annotation
5 BioCexp https://bioconductor.org/packages/3.16/data/experiment
6 BioCworkflows https://bioconductor.org/packages/3.16/workflows
type r_version bioc_version
1 cranlike * <NA>
2 cran * <NA>
3 bioc 4.2.0 3.16
4 bioc 4.2.0 3.16
5 bioc 4.2.0 3.16
6 bioc 4.2.0 3.16
Are you sure, you're being served a binary release? If I download the file at https://packagemanager.posit.co/cran/__linux__/jammy/latest/src/contrib/MASS_7.3-59.tar.gz
, this looks very much like what I get from CRAN as source release, no? Also, according to the PPM web UI, no binary is currently available for jammy/4.2 (see screenshot above). Or am I misunderstanding something here?
As requested:
> pak::repo_get()
name
1 CRAN
2 efv
3 BioCsoft
4 BioCann
5 BioCexp
6 BioCworkflows
url type
1 https://packagemanager.rstudio.com/cran/__linux__/jammy/latest cran
2 http://***internal_repo_url*** cranlike
3 https://bioconductor.org/packages/3.16/bioc bioc
4 https://bioconductor.org/packages/3.16/data/annotation bioc
5 https://bioconductor.org/packages/3.16/data/experiment bioc
6 https://bioconductor.org/packages/3.16/workflows bioc
r_version bioc_version
1 * <NA>
2 * <NA>
3 4.2.3 3.16
4 4.2.3 3.16
5 4.2.3 3.16
6 4.2.3 3.16
Yeah, it is a binary package:
> dl <- pak::pkg_download("MASS")
i Getting 2 pkgs (1.03 MB) and 2 pkgs with unknown sizes
v Got MASS 7.3-59 (source) (515.84 kB)
v Got MASS 7.3-59 (source) (515.84 kB)
v Got MASS 7.3-59 (x86_64-pc-linux-gnu-ubuntu-22.04) (1.12 MB)
v Got MASS 7.3-59 (x86_64-pc-linux-gnu-ubuntu-22.04) (1.12 MB)
v Downloaded 4 packages (3.27 MB)in 7.1s
> dl$sources
[[1]]
[1] "https://packagemanager.posit.co/cran/__linux__/jammy/latest/src/contrib/4.4.0/Recommended/MASS_7.3-59.tar.gz"
[[2]]
[1] "https://packagemanager.posit.co/cran/__linux__/jammy/latest/src/contrib/MASS_7.3-59.tar.gz"
[[3]]
[1] "https://cloud.r-project.org/src/contrib/MASS_7.3-59.tar.gz"
[2] "https://cloud.r-project.org/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz"
[[4]]
[1] "https://cloud.r-project.org/src/contrib/4.4.0/Recommended/MASS_7.3-59.tar.gz"
[2] "https://cloud.r-project.org/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz"
> dl$fulltarget[2]
[1] "./src/contrib/x86_64-pc-linux-gnu-ubuntu-22.04/4.2/MASS_7.3-59.tar.gz"
> untar(dl$fulltarget[2], list = TRUE)[1:10]
[1] "MASS/CITATION" "MASS/DESCRIPTION" "MASS/INDEX"
[4] "MASS/Meta/" "MASS/Meta/Rd.rds" "MASS/Meta/data.rds"
[7] "MASS/Meta/features.rds" "MASS/Meta/hsearch.rds" "MASS/Meta/links.rds"
[10] "MASS/Meta/nsInfo.rds"
I'm sorry if all of this comes down to me somehow misunderstanding things, but for the URL you have in dl$sources[[2]]
, which I assume corresponds to what you end up with in dl$fulltarget[2]
, I get
url <- "https://packagemanager.posit.co/cran/__linux__/jammy/latest/src/contrib/MASS_7.3-59.tar.gz"
tmp <- tempfile()
download.file(url, tmp)
untar(tmp, list = TRUE)[1:10]
#> [1] "MASS/" "MASS/NAMESPACE" "MASS/LICENCE.note"
#> [4] "MASS/ChangeLog" "MASS/data/" "MASS/data/Boston.rda"
#> [7] "MASS/data/immer.rda" "MASS/data/geyser.rda" "MASS/data/npr1.rda"
#> [10] "MASS/data/phones.rda"
unlink(tmp)
Which I find unsurprising given the src/contrib
path.
What am I missing here? Are the two of us getting different files?
Yes, RSPM sends different files depending on your User Agent header: https://packagemanager.posit.co/__docs__/admin/serving-binaries/#binary-user-agents
> getOption("HTTPUserAgent")
[1] "R/4.2.0 R (4.2.0 x86_64-pc-linux-gnu x86_64 linux-gnu)"
Many thanks for clarifying this bit & sorry for the detour with my past ~3 msgs. I was running some of that under macOS.
I can still reproduce the problem though:
Start up image rocker/verse:4.2
Just to make sure
> source("https://packagemanager.posit.co/__docs__/admin/check-user-agent.R")
#> R installation path: /usr/local/lib/R
#> R version: R version 4.2.3 (2023-03-15)
#> OS version: Ubuntu 22.04.2 LTS
#> HTTPUserAgent: R/4.2.3 R (4.2.3 x86_64-pc-linux-gnu x86_64 linux-gnu)
#> Download method: libcurl
#> Download extra args:
#>
#> ----------------------------
#>
#> Success! Your user agent is correctly configured.
Set repos
> options(
+ repos = c(
+ CRAN = "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest"
+ )
+ )
Install pak
> install.packages("pak", repos = "https://r-lib.github.io/p/pak/stable/")
#> Installing package into ‘/usr/local/lib/R/site-library’
#> (as ‘lib’ is unspecified)
#> trying URL 'https://r-lib.github.io/p/pak/stable/src/contrib/../../linux/x86_64/pak_0.5.0_R-4-2_x86_64-linux.tar.gz'
#> Content type 'application/gzip' length 8116284 bytes (7.7 MB)
#> ==================================================
#> downloaded 7.7 MB
#>
#> * installing *binary* package ‘pak’ ...
#> * DONE (pak)
#>
#> The downloaded source packages are in
#> ‘/tmp/RtmpFGKAD3/downloaded_packages’
Download MASS
> dl <- pak::pkg_download("MASS")
#> ✔ Updated metadata database: 2.86 MB in 7 files.
#> ✔ Updating metadata database ... done
#> ℹ Getting 2 pkgs with unknown sizes
#> ✔ Got MASS 7.3-59 (x86_64-pc-linux-gnu-ubuntu-22.04) (516.09 kB)
#> ✔ Got MASS 7.3-59 (x86_64-pc-linux-gnu-ubuntu-22.04) (516.09 kB)
#> ✔ Downloaded 2 packages (1.03 MB)in 1.7s
Check downloads
> dl$sources
#> [[1]]
#> [1] "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/4.4.0/Recommended/MASS_7.3-59.tar.gz"
#> [2] "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz"
#>
#> [[2]]
#> [1] "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/MASS_7.3-59.tar.gz"
#> [2] "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz"
> untar(dl$fulltarget[1], list = TRUE)[1:10]
#> [1] "MASS/" "MASS/NAMESPACE" "MASS/LICENCE.note"
#> [4] "MASS/ChangeLog" "MASS/data/" "MASS/data/Boston.rda"
#> [7] "MASS/data/immer.rda" "MASS/data/geyser.rda" "MASS/data/npr1.rda"
#> [10] "MASS/data/phones.rda"
> untar(dl$fulltarget[2], list = TRUE)[1:10]
#> [1] "MASS/" "MASS/NAMESPACE" "MASS/LICENCE.note"
#> [4] "MASS/ChangeLog" "MASS/data/" "MASS/data/Boston.rda"
#> [7] "MASS/data/immer.rda" "MASS/data/geyser.rda" "MASS/data/npr1.rda"
#> [10] "MASS/data/phones.rda"
This is probably an edge case in pak. I don't see it if I set up my repos like this:
> getOption("repos")
RSPM
"https://packagemanager.posit.co/cran/__linux__/jammy/latest"
CRAN
"https://cloud.r-project.org"
Hopefully that works as a workaround for you.
As for the bug itself, I think it happens because RSPM has two versions of MASS, and only has a binary for one version. pak sees that RSPM has binaries for MASS, and assumes that both builds are binary.
Yes, I can confirm that the problem goes away when setting repos as
options(
repos = c(
RSPM = "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest",
CRAN = "https://cloud.r-project.org"
)
)
I then get the same results as you do.
In terms of workarounds, I'm currently using a PPM snapshot from before the last update to MASS (sometime last week). Which works fine for me for now.
In terms of general advice, if you consider this an edge case, would you suggest something like "always set up a proper CRAN repo alongside PPM for pak to function properly"?
At any rate, thanks for taking the time to discuss (and for filling me in on the importance of user agent headers for interacting with PPM).
In terms of general advice, if you consider this an edge case, would you suggest something like "always set up a proper CRAN repo alongside PPM for pak to function properly"?
Oh, an edge case is still a bug, and I'll fix it asap.
Ah, sorry, misunderstood you there. Thanks!
In terms of general advice, if you consider this an edge case, would you suggest something like "always set up a proper CRAN repo alongside PPM for pak to function properly"?
AFAIU the issue is not due to the lack of a "proper" CRAN repo - PPM also provides the same sources as the CRAN repo besides the binaries - but that {pak} errors after having received the binary, i.e. in its post-processing step. The only scenario and reason in which you would not see this issue with https://cloud.r-project.org/ as a single repo would be that {pak} would just directly get the source instead of the binary (because CRAN does not provide Linux binaries) and hence can't fail at the "is not a valid binary" step.
There are about four interacting issues here.
RSPM replies to both the proper and the Archive
URLs.
pak tries both URLs in parallel, because for a "normal" CRAN repo, only one of them will work. Whichever file arrives first will be used as the result. If you are lucky the binary arrives first, if you are unlucky the source package. RSPM usually sends the same file for both, so apart from the potential traffic waste, there are no further issues:
> download.file("https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/Archive/cli/cli_3.6.1.tar.gz", "cli.tar.gz")
trying URL 'https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/Archive/cli/cli_3.6.1.tar.gz'
Content type 'binary/octet-stream' length 1260295 bytes (1.2 MB)
==================================================
downloaded 1.2 MB
untar("cli.tar.gz", list = TRUE)[1:10] [1] "cli/DESCRIPTION" "cli/INDEX" "cli/LICENSE" [4] "cli/Meta/" "cli/Meta/Rd.rds" "cli/Meta/features.rds" [7] "cli/Meta/hsearch.rds" "cli/Meta/links.rds" "cli/Meta/nsInfo.rds" [10] "cli/Meta/package.rds"
> download.file("https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/4.4.0/Recommended/MASS_7.3-59.tar.gz", "MASS4.tar.gz")
trying URL 'https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/4.4.0/Recommended/MASS_7.3-59.tar.gz'
Content type 'binary/octet-stream' length 1116983 bytes (1.1 MB)
==================================================
downloaded 1.1 MB
> untar("MASS4.tar.gz", list = TRUE)[1:10]
[1] "MASS/CITATION" "MASS/DESCRIPTION" "MASS/INDEX"
[4] "MASS/Meta/" "MASS/Meta/Rd.rds" "MASS/Meta/data.rds"
[7] "MASS/Meta/features.rds" "MASS/Meta/hsearch.rds" "MASS/Meta/links.rds"
[10] "MASS/Meta/nsInfo.rds"
> download.file("https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz", "MASS.tar.gz")
trying URL 'https://packagemanager.rstudio.com/cran/__linux__/jammy/latest/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz'
Content type 'binary/octet-stream' length 516089 bytes (503 KB)
==================================================
downloaded 503 KB
> untar("MASS_7.3-59.tar.gz", list = TRUE)[1:10]
[1] "MASS/" "MASS/NAMESPACE" "MASS/LICENCE.note"
[4] "MASS/ChangeLog" "MASS/data/" "MASS/data/Boston.rda"
[7] "MASS/data/immer.rda" "MASS/data/geyser.rda" "MASS/data/npr1.rda"
[10] "MASS/data/phones.rda"
To fix this, first I need to fix 4. and handle the case when RSPM replies with an unexpected source package.
Then we can think about avoiding the duplicate downloads.
I also reported the issue of sending a source package instead of a binary to the RSPM team.
This will be eventually fixed in PPM: https://github.com/rstudio/package-manager/issues/10471 (private repo, sorry).
I'll also add a workaround in a minute.
OK, there is a workaround now, in the pkgcache package that pak uses internally. It will be in tomorrow's nightly pak devel
build: https://pak.r-lib.org/dev/reference/install.html#nightly-builds
@gaborcsardi Any ETA when this will make it into "stable"? I have a bunch of WFs running on "stable" and would like to avoid having to point all of them to "devel" and then back again.
Here's one example for which {KernSmooth} is the issue instead of {MASS}: https://github.com/mlr-org/mlr3spatiotempcv/actions/runs/5046852603/jobs/9052924843?pr=223#step:11:602
I'm running into the same issue with this GHA, specifically related to the {withr} package: https://github.com/ketchbrookanalytics/fcall/actions/runs/7560235802/job/20585804754
Is this perhaps due to the fact that {withr} v3.0.0 was released just a few hours ago? Apologies if this is an issue with my understanding when/how often binaries are made available.
@mthomas-ketchbrook I think that's a different issue that manifests in a similar way, and it happens because of a PPM (formerly known as RSPM) bug. A workaround is to turn off PPM until it is fixed.
I think that's a different issue that manifests in a similar way.
I also ran into this PPM-{withr} problem; 3 hours ago PPM still provided {withr} 2.5.2 = no problem. I can confirm that neither the options(repos = c(RSPM = ..., CRAN = ...)
(florisvdh/n2khabmon-fork@b9b0b72) nor the pak-version: devel
workarounds (florisvdh/n2khabmon-fork@2f28f33) mentioned earlier in this thread solve this one.
The PPM-{withr} problem reported above in https://github.com/r-lib/pak/issues/467#issuecomment-1896442984 has been gone (example log).
When using GitHub Actions, I get the following error that appears to originate from
pak
. It may be related to R version 4.3 since it consistently happens for the Ubuntu-devel version.The error is occurring here:
https://github.com/nlmixr2/rxode2/actions/runs/4355986650/jobs/7613357209#step:6:6313