r-lib / gh

Minimalistic GitHub API client in R
https://gh.r-lib.org
Other
223 stars 52 forks source link

Shorter response from gh than from from direct API call. #128

Closed llrs closed 4 years ago

llrs commented 4 years ago

I've found a discrepancy between gh and a raw call to the API:

API call: https://api.github.com/repos/Bioconductor/Contributions/issues/6/events?per_page=100 Response with 33 elements

With gh:

library("gh")
g <- gh("/repos/Bioconductor/Contributions/issues/6/events", .accept = "application/vnd.github.v3+json", per_page = 100)
length(g)
#> [1] 15

Apparently the response is truncated to just 15 elements even before converting to list fromJSON on the gh_process_response.

I don't understand the code good enough to know if it needs something else or what. But it isn't the authentication or the results per page. I tried to parse the response via other settings (content(res, as = "text", type = "application/json")) but it isn't that the problem.

Hope it can be solved soon. Thanks for making this package!

jennybc commented 4 years ago

I think you might be confused about per_page versus .limit:

x <- gh::gh("/repos/Bioconductor/Contributions/issues/6/events", .limit = 100)
length(x)
#> [1] 33

Created on 2020-08-07 by the reprex package (v0.3.0.9001)

llrs commented 4 years ago

Mmh, no, I updated the latest gh and this code didn't report 33 values as did for your code:

x <- gh::gh("/repos/Bioconductor/Contributions/issues/6/events", .limit = 100)
length(x)
#> [1] 15

Created on 2020-08-08 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.1 (2020-06-06) #> os Ubuntu 20.04.1 LTS #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Madrid #> date 2020-08-08 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.1) #> backports 1.1.8 2020-06-17 [1] CRAN (R 4.0.1) #> callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.1) #> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.1) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.1) #> curl 4.3 2019-12-02 [1] CRAN (R 4.0.1) #> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.1) #> devtools 2.3.1 2020-07-21 [1] CRAN (R 4.0.1) #> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.1) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.1) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.1) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.1) #> gh 1.1.0 2020-01-24 [1] CRAN (R 4.0.1) #> glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.1) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.1) #> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.1) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.1) #> jsonlite 1.7.0 2020-06-25 [1] CRAN (R 4.0.1) #> knitr 1.29 2020-06-23 [1] CRAN (R 4.0.1) #> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.1) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.1) #> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.1) #> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.1) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.1) #> processx 3.4.3 2020-07-05 [1] CRAN (R 4.0.1) #> ps 1.3.3 2020-05-08 [1] CRAN (R 4.0.1) #> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.1) #> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.1) #> rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.1) #> rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.1) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.1) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.1) #> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.1) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.1) #> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.1) #> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.1) #> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.1) #> xfun 0.16 2020-07-24 [1] CRAN (R 4.0.1) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.1) #> #> [1] /home/lluis/bin/R/4.0.1/lib/R/library ```

Maybe there is something wrong with curl or some other issue on the internet protocol. Could you post your session info to see if there is something wrong/not matching between the sessions?

jennybc commented 4 years ago
x <- gh::gh("/repos/Bioconductor/Contributions/issues/6/events", .limit = 100)
length(x)
#> [1] 33

Created on 2020-08-08 by the reprex package (v0.3.0.9001)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.2 (2020-06-22) #> os macOS Catalina 10.15.6 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_CA.UTF-8 #> ctype en_CA.UTF-8 #> tz America/Vancouver #> date 2020-08-08 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) #> backports 1.1.8 2020-06-17 [1] CRAN (R 4.0.0) #> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0) #> curl 4.3 2019-12-02 [1] CRAN (R 4.0.0) #> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> gh 1.1.0.9000 2020-08-07 [1] local #> glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.0) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0) #> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.0) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2) #> jsonlite 1.7.0 2020-06-25 [1] CRAN (R 4.0.2) #> knitr 1.29 2020-06-23 [1] CRAN (R 4.0.1) #> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0) #> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0) #> pillar 1.4.6 2020-07-10 [1] CRAN (R 4.0.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0) #> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0) #> reprex 0.3.0.9001 2020-08-06 [1] local #> rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.0) #> rmarkdown 2.3.3 2020-08-06 [1] Github (rstudio/rmarkdown@204aa41) #> rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) #> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) #> styler 1.3.2 2020-02-23 [1] CRAN (R 4.0.0) #> tibble 3.0.3.9000 2020-07-16 [1] local #> vctrs 0.3.2 2020-07-15 [1] CRAN (R 4.0.0) #> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0) #> xfun 0.16 2020-07-24 [1] CRAN (R 4.0.2) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) #> #> [1] /Users/jenny/Library/R/4.0/library #> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library ```
jennybc commented 4 years ago

Although I don't really see how it matters here, you should probably also consider whether you have a GitHub PAT configured and what its scopes are. Sometimes that has more of an effect than you think. But I seem to get the same result with and without a configured PAT.

llrs commented 4 years ago

The GitHub PAT is not relevant, as you said it should retrieve the same with or without it. But I find it strange that using the browser I get all the events and using the gh I don't. So there must be something wrong with the headers, call, or connection when doing it through R. Also tested with a docker image and I could retrieve all of them.

I close the issue as it is not related to gh. But might post here if find something else relevant for future users or myself. Thanks

llrs commented 4 years ago

Apparently the issue is quite complex. On the docker terminal (docker image used bioconductor/bioconductor_docker:latest) :

rstudio@4db06b7c17e7:/usr/local/lib/R/etc$ R --vanilla

R version 4.0.0 (2020-04-24) -- "Arbor Day"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> x <- gh::gh("/repos/Bioconductor/Contributions/issues/6/events", .limit = 100)
> length(x)
[1] 33
> Sys.setenv(GITHUB_PAT="MyTokenSetOnRenviron")
> x <- gh::gh("/repos/Bioconductor/Contributions/issues/6/events", .limit = 100)
> length(x)
[1] 15

On my terminal:

llrs@linux ~ % R4 --vanilla
R version 4.0.1 (2020-06-06) -- "See Things Now"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> x <- gh::gh("/repos/Bioconductor/Contributions/issues/6/events", .limit = 100)
lengtError in gh_process_response(raw) : 
GitHub API error (401): 401 Unauthorized
Message: Bad credentials
Read more at https://docs.github.com/rest
> length(x)
Error: object 'x' not found
> Sys.setenv(GITHUB_PAT="MyTokenSetOnRenviron")
> x <- gh::gh("/repos/Bioconductor/Contributions/issues/6/events", .limit = 100)
> length(x)
[1] 15

The query without any authorization should have work.

jennybc commented 4 years ago

It smells like a PAT and scopes problem.

llrs commented 4 years ago

I had two GITHUB_PAT sets on my environment (one of them no longer valid). Deleted the incorrect one. Now I can reproduce the docker behaviour in my computer. Reported privately via Support and also posted online here

llrs commented 4 years ago

So, with this issue I found two important things:

1) There is a limit of 40000 events on the API and calls to APIs do not return events or issues from blocked users.

2) There is a problem on the github side when querying the API with a blocked user who is a bot. Check this post on the github community site were one employer reported the issue

jennybc commented 4 years ago

Interesting!