r-lib / gh

Minimalistic GitHub API client in R
https://gh.r-lib.org
Other
223 stars 52 forks source link

Pagination merge issue #136

Closed rundel closed 3 years ago

rundel commented 3 years ago

This is related to #135, I was also seeing strange behavior with the search API and I think I understand the cause now.

Most of the GitHub api endpoints either return a single result as a single json object (e.g. GET /user) or multiple results as a json array (e.g. GET /orgs/:org/members). However, there are a few endpoints where the endpoint returns a single json object that then contains the paginated array (e.g. GET /repos/:owner/:repo/actions/runs and GET /search/repositories) - the problem arises for these latter cases.

This pull request has a proposed fix to deal with the merging of these types separately so that we get the "right" behavior for all three cases. The bug is a bit insidious because any results past the first page are concatenated to the result object, but because the attributes are overwritten the new values don't have a names attribute and the first results are present with the expected structure. See the example below:

z = gh::gh(endpoint = "GET /search/repositories", q = "org:r-lib", .limit = 200)

str(z, max.level=1, give.attr = FALSE)
## List of 6
##  $ total_count       : int 115
##  $ incomplete_results: logi FALSE
##  $ items             :List of 100
##  $ NA                : int 115
##  $ NA                : logi FALSE
##  $ NA                :List of 15

Additionally, I think some rethinking of the pagination may be necessary. For example length(res) < .limit in https://github.com/r-lib/gh/blob/b524a442b854e9acc45fed0b0f3e5b2666fc8783/R/gh.R#L177 does not work correctly for the named multiresult endpoints but is also doing the wrong thing for the single result endpoints (although this doesn't matter since these never have a next link). The name of the results array also changes between endpoints so it is difficult to workout what the current # of results is.

Another related problem is using a small .limit with one of the single result endpoints, e.g.

gh::gh(endpoint = "GET /user", .limit = 1)
Error in attributes(res) <- res_attr : 
  'names' attribute [39] must be the same length as the vector [1]

as this currently ends up incorrectly subsetting the results object values instead of the "results".

gaborcsardi commented 3 years ago

Thanks! This sort of heuristic is somewhat risky, because we don't know what it might break in the other parts of the API. OTOH it seems beneficial, so I'll merge it and we'll if it breaks anything.