Closed sentry-io[bot] closed 4 years ago
I'll look into this.
With the PR above this should happen less often, but we'll need to wait for a time when more builds are happening at once.
I plan to revisit this issue after the architecture discussions next week.
Sentry issue: RED-HAT-0P-2J0
This issue has been marked as stale because it hasn't seen any activity for the last 60 days.
Stale issues are closed after 14 days, unless the label is removed by a maintainer or someone comments on it.
This is done in order to ensure that open issues are still relevant.
Thank you for your contribution! :unicorn: :rocket: :robot:
(Note: issues labeled with pinned, security, bug or EPIC are never marked as stale.)
@TomasTomecek I moved this for us to discuss the possible solutions next Tue and to tackle it next sprint. Breaking up the tasks lead to the access token being requested more frequently, and so these error became more frequent. fyi @dhodovsk
@packit-service/the-packit-team This issue would need some attention and ideas for how to solve it.
oh, so this is the for so many 401s in sentry
yup, we should absolutely prioritize this
Some thoughts on this:
Tokens to access repositories are retrieved in ogr. This happens once for every GithubProject
object. Most (all?) of the Celery tasks create their GithubProject
object to interact with the repositories.
There is one token per repository.
With multiple workers picking tasks from the queue, there can be multiple tasks interacting with the same repository (not necessarily with the same PR).
Whenever a new token is retrieved for a repository the previous token is invalidated. If not made invalid, tokens are valid for an hour.
The race condition happens when:
BadCredentialsException
.To solve this we have the following options:
Create a central place to retrieve, store and renew tokens. Tasks should get the tokens for repositories they work with from this place, and ogr should be able to use this token when present instead of requesting a new one.
any other ideas? :thinking:
+1 I'd store them in DB (redis/psql, doesn't matter) and let p-s handle the token management. I'd say this is secure-ish since they last only an hour as you say.
tokman is deployed to prod, so this should happen less.
Sentry Issue: RED-HAT-0P-2FB
This can be reliably reproduced by running the following script in 4 parallel processes: