nix-community / buildbot-nix

A nixos module to make buildbot a proper Nix-CI [maintainer=@Mic92]
64 stars 18 forks source link

JWT token creation is unreliable #184

Closed MagicRB closed 2 weeks ago

MagicRB commented 2 weeks ago

I've observed random failures, where GitHub would deny the JWT or installation tokens, due to them being expired or being improperly generated. @Mic92 has ran into it too.

The code needs a checkup.

MagicRB commented 2 weeks ago

The error we saw, seems to be coming from the token generation code, not from renewal being late.

Mic92 commented 2 weeks ago

Thanks for investigating.

MagicRB commented 2 weeks ago

@Mic92 was the error ephemeral? did you observe it within the same 10 minute time span again? without a buildbot restart?

MagicRB commented 2 weeks ago

I found https://buildbot.thalheim.io/#/builders/42/builds/360 also oh wait that's the same failure lol

Mic92 commented 2 weeks ago

I haven't and https://buildbot.thalheim.io/#/builders/42/builds/360 might be also just before I deployed the fixed version.

MagicRB commented 2 weeks ago

you mean #179 ?

Mic92 commented 2 weeks ago

Yes.

MagicRB commented 2 weeks ago

That's a separate fix though, I don't think it is related to this issue sadly

Mic92 commented 2 weeks ago

Alright. Let's take it easy and mark it as experimental until we figured out what is causing it: https://github.com/nix-community/buildbot-nix/pull/185

MagicRB commented 2 weeks ago

I'm staring at the code and I don't see it. I propose we ship a debug change, which dumps the expiration information into the logfile if an exception gets thrown from GitHub. Because I'm beginning to think GitHub is the one who's not cooperating. I really do not see how these 5 lines of code could be producing an incorrect exp value

            jwt_iat_drift: timedelta = timedelta(seconds=60)
            now: datetime = datetime.now(tz=UTC)
            iat: datetime = now - jwt_iat_drift
            exp: datetime = iat + lifetime
            jwt_payload = {
                "iat": int(iat.timestamp()),
                "exp": int(exp.timestamp()),
                "iss": str(app_id),
            }
            return (jwt_payload, exp)
Mic92 commented 2 weeks ago

Do you need help reproducing this log, i.e. more logging?

MagicRB commented 2 weeks ago

Yeah, it would be nice to see what actually happens and why, I'm formulating a patch so that it's clearer what I'm looking for, give me a second

MagicRB commented 2 weeks ago

completely untested, but something along these lines, ill try to deploy it to my instance

diff --git a/buildbot_nix/github/installation_token.py b/buildbot_nix/github/installation_token.py
index da6f9f9..e05b6d6 100644
--- a/buildbot_nix/github/installation_token.py
+++ b/buildbot_nix/github/installation_token.py
@@ -3,6 +3,8 @@ from datetime import UTC, datetime, timedelta
 from pathlib import Path
 from typing import Any

+from twisted.logger import Logger
+
 from buildbot_nix.common import (
     HttpResponse,
     atomic_write_file,
@@ -12,6 +14,8 @@ from buildbot_nix.common import (
 from .jwt_token import JWTToken
 from .repo_token import RepoToken

+tlog = Logger()
+

 class InstallationToken(RepoToken):
     GITHUB_TOKEN_LIFETIME: timedelta = timedelta(minutes=60)
@@ -27,12 +31,24 @@ class InstallationToken(RepoToken):
     def _create_installation_access_token(
         jwt_token: JWTToken, installation_id: int
     ) -> HttpResponse:
-        return http_request(
-            f"https://api.github.com/app/installations/{installation_id}/access_tokens",
-            data={},
-            headers={"Authorization": f"Bearer {jwt_token.get()}"},
-            method="POST",
-        )
+        last_exception: Exception
+
+        for retry in range(3):
+            try:
+                return http_request(
+                    f"https://api.github.com/app/installations/{installation_id}/access_tokens",
+                    data={},
+                    headers={"Authorization": f"Bearer {jwt_token.get()}"},
+                    method="POST",
+                )
+                break
+            except Exception as exception:
+                tlog.warn(
+                    f"Getting an installation token from GitHub failed, exp: {jwt_token.exp}"
+                )
+                last_exception = exception
+
+        raise last_exception

     @staticmethod
     def _generate_token(
diff --git a/buildbot_nix/github/jwt_token.py b/buildbot_nix/github/jwt_token.py
index 79fe56b..658922b 100644
--- a/buildbot_nix/github/jwt_token.py
+++ b/buildbot_nix/github/jwt_token.py
@@ -26,14 +26,14 @@ class JWTToken(RepoToken):
         self.app_private_key = app_private_key
         self.lifetime = lifetime

-        self.token, self.expiration = JWTToken.generate_token(
+        self.token, self.expiration, self.exp = JWTToken.generate_token(
             self.app_id, self.app_private_key, lifetime
         )

     @staticmethod
     def generate_token(
         app_id: int, app_private_key: str, lifetime: timedelta
-    ) -> tuple[str, datetime]:
+    ) -> tuple[str, datetime, int]:
         def build_jwt_payload(
             app_id: int, lifetime: timedelta
         ) -> tuple[dict[str, Any], datetime]:
@@ -66,7 +66,7 @@ class JWTToken(RepoToken):
         json_headers = json.dumps({"alg": "RS256", "typ": "JWT"}).encode("utf-8")
         encoded_jwt_parts = f"{base64url(json_headers)}.{base64url(jwt_payload)}"
         encoded_mac = rs256_sign(encoded_jwt_parts, app_private_key)
-        return (f"{encoded_jwt_parts}.{encoded_mac}", expiration)
+        return (f"{encoded_jwt_parts}.{encoded_mac}", expiration, jwt["exp"])

         # installations = paginated_github_request("https://api.github.com/app/installations?per_page=100", generated_jwt)

@@ -74,7 +74,7 @@ class JWTToken(RepoToken):

     def get(self) -> str:
         if datetime.now(tz=UTC) - self.expiration > self.lifetime * 0.8:
-            self.token, self.expiration = JWTToken.generate_token(
+            self.token, self.expiration, self.exp = JWTToken.generate_token(
                 self.app_id, self.app_private_key, self.lifetime
             )
Mic92 commented 2 weeks ago

I copied it to a branch: https://github.com/nix-community/buildbot-nix/compare/debug-jwt?expand=1