nix-community / nixpkgs-update-github-releases

Fetches releases from github for https://github.com/ryantm/nixpkgs-update
Creative Commons Zero v1.0 Universal
3 stars 2 forks source link

Investigate using GraphQL as our backend #5

Open Synthetica9 opened 4 years ago

Synthetica9 commented 4 years ago

It seems like we should be able to request all releases in one go (or in batches) instead of in one request per repo.

This should be much faster.

ryantm commented 4 years ago

We really need to do this, my token just got rate limited!

Synthetica9 commented 4 years ago

I looked into this in the mean time, and there doesn't seem to be a good way to get multiple releases in a single request. However, it is my first time using GraphQL, so I might just be overlooking something.

Synthetica9 commented 4 years ago

@ryantm 5f9acc45763abbf4ce869b0ba6a0604bc158909e should alleviate this issue somewhat.

ryantm commented 3 years ago

Yes, that definitely helped, I haven't had a rate limiting issue since then.

Mic92 commented 3 years ago

I also struggled first to figure out how to do multiple queries in one request with graphql, but I finally figured it out:

{
  a: repository(name: "nur-packages", owner: "Mic92") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) { edges { node { oid } } }
        }
      }
    }
  }
  b: repository(name: "nur-packages", owner: "some-other-user") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) { edges { node { oid } } }
        }
      }
    }
  }
}

Both a and b are arbitrary chosen and can be later used to when looking at the result:

{'a': {'ref': {'target': {'history': {'edges': [{'node': {'oid': 'bd79477f2333510f2e4f6440983977e1c5a69ce8'}}]}}}}, 'b': {'ref': {'target': {'history': {'edges': [{'node': {'oid': 'f39fb799c516cb986945e8a6f8b6cbf5b9d5af2e'}}]}}}}}

Here is a full python snippet:

from typing import Optional, Dict, Any
import urllib.parse
import urllib.request
import json
import sys
import os

class GithubClient:
    def __init__(self, api_token: Optional[str]) -> None:
        self.api_token = api_token

    def _request(
        self, path: str, method: str, data: Optional[Dict[str, Any]] = None
    ) -> Any:
        url = urllib.parse.urljoin("https://api.github.com/", path)
        headers = {"Content-Type": "application/json"}
        if self.api_token:
            headers["Authorization"] = f"token {self.api_token}"

        body = None
        if data:
            body = json.dumps(data).encode("ascii")

        req = urllib.request.Request(url, headers=headers, method=method, data=body)
        resp = urllib.request.urlopen(req)
        return json.loads(resp.read())

    def post(self, path: str, data: Dict[str, str]) -> Any:
        return self._request(path, "POST", data)

    def graphql(self, query: str) -> Dict[str, Any]:
        resp = self.post("/graphql", data=dict(query=query))
        if "errors" in resp:
            raise RuntimeError(f"Expected data from graphql api, got: {resp}")
        data: Dict[str, Any] = resp["data"]
        return data

query = """
{
  a: repository(name: "nur-packages", owner: "Mic92") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) { edges { node { oid } } }
        }
      }
    }
  }
  b: repository(name: "nur-packages", owner: "balsoft") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) { edges { node { oid } } }
        }
      }
    }
  }
}
"""

token = os.environ.get("GITHUB_TOKEN")
if not token:
    print("GITHUB_TOKEN not set")
    sys.exit(1)
client = GithubClient(api_token=token)
d = client.graphql(query)
print(d)
Synthetica9 commented 3 years ago

Yes, that definitely helped, I haven't had a rate limiting issue since then.

Is this repo still used? I was under the impression that this functionality had been ported to Haskell and merged into the main nixpkgs-update repo, but it's not?

ryantm commented 3 years ago

It is still in use! https://github.com/nix-community/infra/blob/5e0e53fbdd59826fb32d12826f9777c87100c597/build01/nixpkgs-update.nix#L67

Synthetica9 commented 3 years ago

It is still in use! nix-community/infra@5e0e53f/build01/nixpkgs-update.nix#L67

Oh, cool! I guess it's just low-maintenance code then...

I've looked into using the GraphQL API, and it definitely seems to be an improvement: we can easily hit multiple repos with a single "request point" (I think every repo only counts for a single request, so we should be able to hit 100 repo's per request point). For reference, here is the query I used:

{
  a: repository(owner: "microsoft", name: "vscode") {
    ...releaseInfo
  }
  b: repository(owner: "junegunn", name: "fzf") {
    ...releaseInfo
  }
  c: repository(owner: "foobar", name: "arsadf") {
    ...releaseInfo
  }
  d: repository(owner: "jgm", name: "pandoc") {
    ...releaseInfo
  }
  e: repository(owner: "swaywm", name: "sway") {
    ...releaseInfo
  }
  f: repository(owner: "sagemath", name: "sage") {
    ...releaseInfo
  }
  rateLimit {
    limit
    cost
    remaining
    resetAt
  }
}

fragment releaseInfo on Repository {
  releases(first: 10) {
    nodes {
      tagName
      isPrerelease
      isDraft
      publishedAt
    }
  }
}

Perhaps we'll even be able to do a full query of all repo's in one shot, but I think some batching is still in order

Mic92 commented 3 years ago

It is still in use! nix-community/infra@5e0e53f/build01/nixpkgs-update.nix#L67

Oh, cool! I guess it's just low-maintenance code then...

I've looked into using the GraphQL API, and it definitely seems to be an improvement: we can easily hit multiple repos with a single "request point" (I think every repo only counts for a single request, so we should be able to hit 100 repo's per request point). For reference, here is the query I used:

{
  a: repository(owner: "microsoft", name: "vscode") {
    ...releaseInfo
  }
  b: repository(owner: "junegunn", name: "fzf") {
    ...releaseInfo
  }
  c: repository(owner: "foobar", name: "arsadf") {
    ...releaseInfo
  }
  d: repository(owner: "jgm", name: "pandoc") {
    ...releaseInfo
  }
  e: repository(owner: "swaywm", name: "sway") {
    ...releaseInfo
  }
  f: repository(owner: "sagemath", name: "sage") {
    ...releaseInfo
  }
  rateLimit {
    limit
    cost
    remaining
    resetAt
  }
}

fragment releaseInfo on Repository {
  releases(first: 10) {
    nodes {
      tagName
      isPrerelease
      isDraft
      publishedAt
    }
  }
}

Perhaps we'll even be able to do a full query of all repo's in one shot, but I think some batching is still in order

You would probably run into some timeout eventually but a higher batch size would be definitely more efficient than scraping each repo individually.