Closed JensTimmerman closed 5 years ago
This would be nice to have, but if you retrieved a large-ish commit log (say, that of the Linux kernel), you'd instantly blow your rate limit. So you have to limit retrieval anyway, and work in chunks; 100 items per chunk sounds reasonable, so GitHub is already giving you what you need.
@toolforger The idea is not to grab all commits in one go but to do it in chunks, like the api suggests. This should however be transparent to the user, hence the PaginatedClient
class suggestion.
Follow the next
URL. It's often available in the request body. Where it is not, it is in the Link
header of the HTTP response; agithub does not expose it currently, but I happen to be working on exactly that.
Yes, that's the point of this issue. Have agithub do this for us so that it is transparent for the user.
If you require a tester, give shout :wink:
@toolforger iirc the problem is that the link isn't in the request body, but in one of the headers, which is not exposed by agithub at the moment. But if you're working on that, great, indeed, please let us know if you need some testing.
I'll just submit a PR :-)
Actually the headers are already available through g.getheaders()
.
Its semantics is questionable (and I'm going to submit a PR for that), but it works.
Meh. Turns out I can't change the API without breaking all callers, so getheaders
it is and remains.
So no PR is going to come forth.
BTW doing pagination support would mean multiple HTTP requests, but getheaders
would be able to return only the headers from the last request, with no way to recover earlier headers. And these can be quite relevant (e.g. etag
and date
, needed for saving on the rate limit by caching previous results).
@toolforger can you elaborate a bit more what the problem is exactly?
So, realistically, the only option is the parse the header ourself and make the 'next' request.
The problem is that I wanted to return the headers as an additional tuple element, so we could write status, response, headers = g.rest.path.get()
, but this would break existing status, response = g.rest.path.get()
code (Python adapts tuples for comparisons, but not for assignments it seems).
So... incompatible change, can't be done before 2.0.
The header doesn't need parsing, it's already a list of key-value pairs, which is nice because I can do dict(g.getheaders())
to turn this into something useful (the list form is better if you handle all items anyway and need performance, but that's not my use case).
Unfortunately, GitHub seems to return a single links
header with next
and prev
(and maybe more values) all lumped into a single string that still needs parsing. But at least it's just one parse :-)
(Update: See https://developer.github.com/v3/#link-header for details; letter case seems to be Link
, not link
.)
@toolforger We can start a 2.0 branch anytime. As far as returning the headers with each request: the reason I didn't do it this way was to simplify the case where the headers aren't needed. I'm still ambivalent, but wouldn't mind changing it.
Concerning links
parsing/Pagination: This is out of scope for the API proper, but putting it in a contrib
package or something would be okay. If it requires adding hooks to the main classes, that's fine, too; as long as they don't do anything by default, of course.
AGitHub was designed expressly to be agnostic to the idiosyncrasies of the upstream protocol, without introducing a new fuzz-layer atop it. AGitHub acts just like upstream, whether your upstream is GitHub or Facebook. That is its raison d'être, and its best feature, IMO.
(In fact, the name AGitHub is now something of a misnomer, since it supports other upstream servers equally well; I had planned to separate out the generic parts of it into a new library, but haven't gotten around to it.)
On accessing headers, I'd say that returning (status, response, headers)
is not really a problem for 2.0; callers can still say status, response, _ = g. ... .get()
and ignore the headers.
On making the library adaptable, the request results should be passed through a hookable function. (Subclassing would work too, but it's usually better to put adapter logic into a separate response transformer function, it's easier to combine multiple transformers that way.)
On being agnostic, you should probably add a mission statement what the library does. A library that's agnostic about everything is just http
after all... so, what does it doe above and beyond what http
offers?
One thing that comes to mind is the attribute-to-HTTP-path transformation; there's more code in the class but I didn't look at that too closely. And maybe defining the library's mission by enumerating the services is a too low-level perspective,
Here are some other projects that use agithub, solving pagination in their code. Dropping links here as possible inspiration for adding this functionality to the GitHub module in agithub
Though it sounds like what should really be done is not what these examples do of parsing the URLs to extract the page number and passing that as an argument but instead to just call the Hypermedia link relations directly.
This quick hack [1] works nicely for me. But it is a hack! Usage:
for repo in ag_get_all(gh.orgs.github.repos.get):
print(repo["full_name"])
[1] https://gist.github.com/e4384145127a915b96849537bef3b93d
I've got a branch now with this functionality and I'm just finishing the unit tests. As soon as it's done I'll put a PR up for it. I've been using the functionality for a few months now.
I have moved on to other tasks so this doesn't help me anymore, but thanks anyway :-)
GitHub returns a maximum of 100 items, the rest of the response can be gotten by using one of the links in the headers, especially the link to the next page, with the rest of the results. https://developer.github.com/guides/traversing-with-pagination/
It would be nice if there was some sort of general way of getting all results from a call to the github api (or an option to specifiy this) so that we could get all results instead of the first X results, and no headers on how to get to the rest.
I was thinking of a PaginatedClient class?