Open alfred-stokespace opened 2 months ago
As a mitigation for this, I'm going to do the following in my private fork.
I'm creating a new uncached client method
func NewUncachedClient(token string) (*github.Client, error) {
oauth2Transport := &oauth2.Transport{
Source: oauth2.StaticTokenSource(
&oauth2.Token{AccessToken: token},
),
}
if !config.Config.IsGHES() {
return github.NewClient(&http.Client{Transport: oauth2Transport}), nil
}
return github.NewEnterpriseClient(config.Config.GitHubURL, config.Config.GitHubURL, &http.Client{Transport: oauth2Transport})
}
which will get called from starter.go's checkRegisteredRunner
function
func (s *Starter) checkRegisteredRunner(ctx context.Context, runnerName string, target datastore.Target) error {
client, err := gh.NewUncachedClient(target.GitHubToken)
which will then be used lower down ...
if _, err := gh.ExistGitHubRunner(cctx, client, owner, repo, runnerName); err == nil {
// success to register runner to GitHub
return nil
} else if !errors.Is(err, gh.ErrNotFound) {
and we'll see whether that blows everything up or not :)
Took quite a bit of digging to get a reliable test case that got me to a smoking gun...
First, the symptom....
For very short jobs, as in jobs that only ran for a 20 - 50 seconds (for example a job that errors out super fast on tag check), these jobs would produce a runner, great! The job would complete, good! Buutt... I'd end up eventually getting another runner that would replace that runner. When I'd check in the db i'd find a job in the jobs table for that same job which already ran.
As I turned debug on I kept finding these over and over...
and these were getting produced despite the fact that I was watch with a
gh api ... /repos/$org/$repo/actions/runners
cli script the runner get registered, be online, then be removed. yet still I kept seeing that message sayingmyshoes-e06dbeee-2566-46af-86f1-6ea1f933aca3 is not found in GitHub, will retry...
So I threw a bunch of logging in and finally nailed down to a test with to a minimum of this callRunning against my shell script...
and sure enough! I saw that the bash process was accurate but the client you are producing was reporting stale information.
So, I dug deeper! and found this...
So, I decide, hey lets deactivate that and see what happens...
this
became this
difference is, I'm not using the cache transport, I'm just using the outh2Transport now.
and when I put bash against the go test, they matched up! Good.
So I'm thinking is happening is the following...
So eventually, after paying for a runner for 6 hours this gets cleaned up.