r-lib / pak

A fresh approach to package installation
https://pak.r-lib.org
646 stars 57 forks source link

Unnecessary git authentication for public repositories #504

Closed vincent-hanlon closed 1 year ago

vincent-hanlon commented 1 year ago

It looks as though users whose git credentials are out of date can't use pak to install from public github repositories. As far as I can tell this is unnecessary, and it would make more sense if valid git credentials were only required for private repositories. By contrast, devtools can install from public repositories even when git credentials are invalid. In recent testing of R installations for a workflow I've been helping develop, 2 out of 5 testers ran into this problem and struggled with it for a while, so it is obviously a barrier to using pak for GitHub packages.

In this example, I updated my PAT on github and then tried to install two R packages on the server I use. The installations failed with the error below, but it works nonetheless with devtools. Then I updated my git credentials on the server, closed and re-opened R, and the installations worked fine.

> pak::pkg_install("vincent-hanlon/InvertypeR")
Error:
! error in pak subprocess
Caused by error:
! Could not solve package dependencies:
* vincent-hanlon/InvertypeR: ! pkgdepends resolution error for vincent-hanlon/InvertypeR.
Caused by error:
! Bad GitHub credentials, make sure that your GitHub token is valid.
Caused by error in `stop(http_error(resp))`:
! Unauthorized (HTTP 401).
Type .Last.error to see the more details.
> pak::pkg_install("Rdatatable/data.table")
Error:
! error in pak subprocess
Caused by error:
! Could not solve package dependencies:
* Rdatatable/data.table: ! pkgdepends resolution error for Rdatatable/data.table.
Caused by error:
! Bad GitHub credentials, make sure that your GitHub token is valid.
Caused by error in `stop(http_error(resp))`:
! Unauthorized (HTTP 401).
Type .Last.error to see the more details.

But installations work with devtools, even without git credentials!

> devtools::install_github("vincent-hanlon/InvertypeR")
Downloading GitHub repo vincent-hanlon/InvertypeR@HEAD
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?

1: All
2: CRAN packages only
3: None
4: matrixStats  (0.63.0 -> 1.0.0 ) [CRAN]
5: MatrixGen... (1.12.0 -> 1.12.2) [CRAN]
6: Summarize... (1.30.1 -> 1.30.2) [CRAN]

Enter one or more numbers, or an empty line to skip updates: 3
-- R CMD build ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
v  checking for file '/tmp/Rtmp3q8ft9/remotes968691a17f3/vincent-hanlon-InvertypeR-b6cc04a/DESCRIPTION' (407ms)
-  preparing 'invertyper':
v  checking DESCRIPTION meta-information ...
-  checking for LF line-endings in source and make files and shell scripts
-  checking for empty or unneeded directories
   Omitted 'LazyData' from DESCRIPTION
-  building 'invertyper_1.0.0.2.tar.gz'

* installing *source* package 'invertyper' ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (invertyper)

Now with pak, after providing my git credentials:

> pak::pkg_install("Rdatatable/data.table")
  Will install 1 package.
  The package (0 B) is cached.
+ data.table   1.14.9 [bld][cmp] (GitHub: 8803918)
i No downloads are needed, 1 pkg is cached
v Got data.table 1.14.9 (source) (5.52 MB)
i Packaging data.table 1.14.9
v Packaged data.table 1.14.9 (4.7s)
i Building data.table 1.14.9
v Built data.table 1.14.9 (34.9s)
v Installed data.table 1.14.9 (github::Rdatatable/data.table@8803918) (665ms)
v 1 pkg: added 1, dld 1 (NA B) [46.5s]
gaborcsardi commented 1 year ago

Without a PAT you can only make 60 GitHub API queries per hour (per IP), so that will stop working pretty quickly, unfortunately.

The error message clearly tells you what is wrong, and that is the right fix, instead of trying to install packages without a PAT.

vincent-hanlon commented 1 year ago

Thanks Gabor for your quick reply. Your point about 60 API queries per hour makes a lot of sense as a limit.

I happily admit that I am naive about how most of this works, but I do wonder whether it would be possible to make the most of the 60 'free' API queries, and then when they run out provide the same helpful error message you already give. Maybe most R beginners, for example, would only want to install a couple packages from GitHub at a time anyway. And of course they are most likely to spend an hour of confusion stuck on the whole git credentials business. I just tested my no-credentials setup and it was totally fine to install ~9 GitHub R packages (with devtools) in the last half hour or so, and that's more than I've ever needed to do in the past.

Either way, thanks for making pak----I've found it super useful ever since I learned about it.

gaborcsardi commented 1 year ago

pak does work without credentials. But if you supply your own credentials, it will use the supplied ones.

vincent-hanlon commented 1 year ago

My mistake----I should have said "invalid credentials" not "no credentials".

gaborcsardi commented 1 year ago

I think pak's job is to use the credentials you supply. Plus, having invalid credentials is something that the user should probably know about, so an early error seems useful.

We could potentially try to fall back to queries without a credential, but this does not seem like a very common case to me, so I think our developer time is better spent on other issues. So I am going to close this, sorry.

aretaon commented 1 year ago

I would like to second @vincent-hanlon : I use pak within a gitea-actions workflow; this involves checking out my original code from gitea with git (i.e. I have a local git with credentials, although they don't match GitHub). When I run pak to fetch code from a public GitHub repo, it will inform me that these credentials are invalid (which is correct) but provide me with no way to supply my correct GitHub credentials. I know this is a very special situation but for now I have to resort to devtools for that matter.

gaborcsardi commented 1 year ago

but provide me with no way to supply my correct GitHub credentials.

pak uses your git credentials for https://github.com for packages from GitHub. You can either use git or gitcreds::gitcreds_set("https://github.com") from R to set these credentials. For pak (but not git itself) you can also set the GITHUB_PAT_GITHUB_COM env var to the GitHub token you want to use.

If you think that pak is using credentials that belong to another host, please open an issue with the details.

gaborcsardi commented 1 year ago

@aretaon Btw. you can also set the GITHUB_PAT_GITHUB_COM=FAIL env var to tell pak that you don't have any credentials for github.com.

aretaon commented 1 year ago

Thanks for your swift reply. As i am running pak in a Gitea actions workflow I prefer setting the Token via environment variables. However, this still results in the same "! Bad GitHub credentials, make sure that your GitHub token is valid." The token I use is a classic token with only public_repo permissions. Anything else I can do to debug the cause of the error?

gaborcsardi commented 1 year ago

@aretaon can you show your workflow?

aretaon commented 1 year ago

Sure:

name: CI Test

on: [push]

jobs:
  build:

    runs-on: ubuntu-latest

    steps:
      - uses: https://github.com/actions/checkout@v3
      - name: Set up Python
        uses: https://github.com/actions/setup-python@v4
        with:
          python-version: '3.9'
          architecture: 'x64'
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
                    pip install ruff pytest
                    if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
      - name: Set up R
        uses: https://github.com/r-lib/actions/setup-r@v2
        with:
            r-version: '4.2.1' # The R version to download (if necessary) and use.
      - name: Build docs
        uses: https://github.com/uibcdf/action-sphinx-docs-to-gh-pages@v2.0.0
        with:
          branch: main
          dir_docs: docsrc
      - name: Set custom environment variables
        run: |
          export GITHUB_PAT_GITHUB_COM=${{ secrets.API_TOKEN_GITHUB }}
          printenv
      - name: Test with pytest
        run: |
          mkdir -p tests/badges
          pytest -s --verbose -p no:cacheprovider tests 
gaborcsardi commented 1 year ago

You cannot set env vars on GHA like that, at least on GHA, you need to use the env key of the step, or the env key of the whole job. E.g. for the step:

      - name: Test with pytest
        run: |
          mkdir -p tests/badges
          pytest -s --verbose -p no:cacheprovider tests
        env:
          GITHUB_PAT_GITHUB_COM: ${{ secrets.API_TOKEN_GITHUB }} 
aretaon commented 1 year ago

Great, that was indeed the problem! With that I am perfectly happy with supplying Pak with the access token. Thanks!