woodpecker-ci / woodpecker

Woodpecker is a simple yet powerful CI/CD engine with great extensibility.
https://woodpecker-ci.org
Apache License 2.0
3.88k stars 345 forks source link

Unauthorized user tokens preventing builds from running when protected by SAML enforcement #3804

Open fernandrone opened 2 weeks ago

fernandrone commented 2 weeks ago

Component

agent

Describe the bug

This is a very intermittent bug that has been hard to track down. We run Woodpecker on GitHub in an organization protected by organization SAML enforcement.

This has happened twice, once in December 2023, then on the 29th of April. When it does, most of our builds will start to fail with the following error:

{"level":"error","error":"GET https://api.github.com/repos/quintoandar/<redacted>/contents/.woodpecker?ref=ced8fe72e731630f9888d10f4aad083071f0b83d:  403 Resource protected by organization SAML enforcement. You must grant your OAuth token access to this organization. []","repo":"quintoandar/<redacted>","user":"<redacted>","time":"2024-04-29T18:17:15Z","message":"could not get folder from forge"}

This essentially means that the user that activated the repository (the user Woodpecker is making requests on behalf on to clone the repository and retrieve the woodpecker file) is authenticated but their token has not been granted access to the organization.

We've found out that when this happens, the "fix" was apparently to logout and then login again with the indicated user. After that the builds would work again, presumably because Woodpecker would have generated a new, valid token.

To avoid dealing with issues of users leaving the organization we tend to automate enabling of Woodpecker repositories so they all authenticate on behalf of the same user. Initially we were using a "bot" GitHub account dedicated for this. However, after this bug, we've moved to using an administrador's account, since it's easier to logout/login of said account to fix the problem. Either way the point being that most repositories will be making their requests on behalf of the same user, which avoids situations of "random user has left the organization, breaking random repository number 42" but at the same time makes this issue way more destructive, as when it happens our whole Woodpecker instances stops working.

This seems very similar to this issue https://github.com/woodpecker-ci/woodpecker/discussions/2482 except that:

  1. We have already configured Woodpecker as an Oauth App, which should not have expiring tokens.

  2. The issue blocking access is always specifically about SAML enforcement; the token seems to be valid as in authenticated, just not authorized to access the organization.

As a further data point, at "random" interval (as in every several weeks) both Drone (which we have used in the past and still keep a legacy instance around) and Woodpecker will pop-up a window like the one below, asking me and other developers to authorize the organization. I am not sure if this is related or not to the problem.

image

I assume that it's possible if, for example, the user/administrator on who's behalf the requests are being made would deny access to Woodpecker/Drone on the pop-up above, then the requests would fail... however:

Steps to reproduce

Unfortunately we do not know how to reproduce this behavior.

Expected behavior

No response

System Info

Woodpecker 2.3.0
Kubernetes Backend
GitHub forge

Additional context

No response

Validations

fernandrone commented 2 weeks ago

Taking this as an opportunity to ask: is there an ongoing issue to support github app? I understand from looking at issue story that it has been tried, but it's not there yet (as noted on the docs https://woodpecker-ci.org/docs/administration/forges/github), so we're stuck with Oauth app and making requests on behalf of users.

anbraten commented 2 weeks ago

I am not sure about the state of trying the github app. Just checked the code again. We have several places where the token is refreshed if necessary:

So even if its a admin account that doesn't login regularly Woodpecker should refresh the token if the repo / instance has some activity going on. Therefore using a Github app could fix your issue. Started to add it in #3811. It will build PR images, would be awesome if you could test it.

fernandrone commented 2 weeks ago

Thanks a lot! Unfortunately I'll leave for vacations now. But I can try this when I come back in about 2 and a half weeks from now.