Open Synthetica9 opened 4 years ago
We really need to do this, my token just got rate limited!
I looked into this in the mean time, and there doesn't seem to be a good way to get multiple releases in a single request. However, it is my first time using GraphQL, so I might just be overlooking something.
@ryantm 5f9acc45763abbf4ce869b0ba6a0604bc158909e should alleviate this issue somewhat.
Yes, that definitely helped, I haven't had a rate limiting issue since then.
I also struggled first to figure out how to do multiple queries in one request with graphql, but I finally figured it out:
{
a: repository(name: "nur-packages", owner: "Mic92") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 1) { edges { node { oid } } }
}
}
}
}
b: repository(name: "nur-packages", owner: "some-other-user") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 1) { edges { node { oid } } }
}
}
}
}
}
Both a
and b
are arbitrary chosen and can be later used to when looking at the result:
{'a': {'ref': {'target': {'history': {'edges': [{'node': {'oid': 'bd79477f2333510f2e4f6440983977e1c5a69ce8'}}]}}}}, 'b': {'ref': {'target': {'history': {'edges': [{'node': {'oid': 'f39fb799c516cb986945e8a6f8b6cbf5b9d5af2e'}}]}}}}}
Here is a full python snippet:
from typing import Optional, Dict, Any
import urllib.parse
import urllib.request
import json
import sys
import os
class GithubClient:
def __init__(self, api_token: Optional[str]) -> None:
self.api_token = api_token
def _request(
self, path: str, method: str, data: Optional[Dict[str, Any]] = None
) -> Any:
url = urllib.parse.urljoin("https://api.github.com/", path)
headers = {"Content-Type": "application/json"}
if self.api_token:
headers["Authorization"] = f"token {self.api_token}"
body = None
if data:
body = json.dumps(data).encode("ascii")
req = urllib.request.Request(url, headers=headers, method=method, data=body)
resp = urllib.request.urlopen(req)
return json.loads(resp.read())
def post(self, path: str, data: Dict[str, str]) -> Any:
return self._request(path, "POST", data)
def graphql(self, query: str) -> Dict[str, Any]:
resp = self.post("/graphql", data=dict(query=query))
if "errors" in resp:
raise RuntimeError(f"Expected data from graphql api, got: {resp}")
data: Dict[str, Any] = resp["data"]
return data
query = """
{
a: repository(name: "nur-packages", owner: "Mic92") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 1) { edges { node { oid } } }
}
}
}
}
b: repository(name: "nur-packages", owner: "balsoft") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 1) { edges { node { oid } } }
}
}
}
}
}
"""
token = os.environ.get("GITHUB_TOKEN")
if not token:
print("GITHUB_TOKEN not set")
sys.exit(1)
client = GithubClient(api_token=token)
d = client.graphql(query)
print(d)
Yes, that definitely helped, I haven't had a rate limiting issue since then.
Is this repo still used? I was under the impression that this functionality had been ported to Haskell and merged into the main nixpkgs-update repo, but it's not?
It is still in use! nix-community/infra@
5e0e53f
/build01/nixpkgs-update.nix#L67
Oh, cool! I guess it's just low-maintenance code then...
I've looked into using the GraphQL API, and it definitely seems to be an improvement: we can easily hit multiple repos with a single "request point" (I think every repo only counts for a single request, so we should be able to hit 100 repo's per request point). For reference, here is the query I used:
{
a: repository(owner: "microsoft", name: "vscode") {
...releaseInfo
}
b: repository(owner: "junegunn", name: "fzf") {
...releaseInfo
}
c: repository(owner: "foobar", name: "arsadf") {
...releaseInfo
}
d: repository(owner: "jgm", name: "pandoc") {
...releaseInfo
}
e: repository(owner: "swaywm", name: "sway") {
...releaseInfo
}
f: repository(owner: "sagemath", name: "sage") {
...releaseInfo
}
rateLimit {
limit
cost
remaining
resetAt
}
}
fragment releaseInfo on Repository {
releases(first: 10) {
nodes {
tagName
isPrerelease
isDraft
publishedAt
}
}
}
Perhaps we'll even be able to do a full query of all repo's in one shot, but I think some batching is still in order
It is still in use! nix-community/infra@
5e0e53f
/build01/nixpkgs-update.nix#L67Oh, cool! I guess it's just low-maintenance code then...
I've looked into using the GraphQL API, and it definitely seems to be an improvement: we can easily hit multiple repos with a single "request point" (I think every repo only counts for a single request, so we should be able to hit 100 repo's per request point). For reference, here is the query I used:
{ a: repository(owner: "microsoft", name: "vscode") { ...releaseInfo } b: repository(owner: "junegunn", name: "fzf") { ...releaseInfo } c: repository(owner: "foobar", name: "arsadf") { ...releaseInfo } d: repository(owner: "jgm", name: "pandoc") { ...releaseInfo } e: repository(owner: "swaywm", name: "sway") { ...releaseInfo } f: repository(owner: "sagemath", name: "sage") { ...releaseInfo } rateLimit { limit cost remaining resetAt } } fragment releaseInfo on Repository { releases(first: 10) { nodes { tagName isPrerelease isDraft publishedAt } } }
Perhaps we'll even be able to do a full query of all repo's in one shot, but I think some batching is still in order
You would probably run into some timeout eventually but a higher batch size would be definitely more efficient than scraping each repo individually.
It seems like we should be able to request all releases in one go (or in batches) instead of in one request per repo.
This should be much faster.