jenkins-infra / helpdesk

Open your Infrastructure related issues here for the Jenkins project
https://github.com/jenkins-infra/helpdesk/issues/new/choose
16 stars 10 forks source link

crawler build fails because `azcopy sync` returns a 403 #3875

Closed NotMyFault closed 8 months ago

NotMyFault commented 8 months ago

Service(s)

trusted.ci.jenkins.io

Summary

The crawler job is failing again, this time azcopy sync yields a 403, causing the job to fail.

Reproduction steps

No response

dduportal commented 8 months ago

Audit:

Checking...

dduportal commented 8 months ago

~As I don't know what @lemeurherve did in #3414 except the linked terraform PRs~ (see https://github.com/jenkins-infra/helpdesk/issues/3875#issuecomment-1868290946)

As we're in holidays for the next week, I've commented out the azcopy command https://github.com/jenkins-infra/crawler/commit/d0efde1404c2e45cc71737fc532025b8b325b061 and a build is currently running (I'll watch it).~

We'll work on this in January then

dduportal commented 8 months ago

And now

$ aws s3 sync ./updates/ s3://westeurope-updates-jenkins-io/updates/ --no-progress --no-follow-symlinks --size-only --exclude .svn --endpoint-url https://8d1838a43923148c5cee18ccc356a594.r2.cloudflarestorage.com/
09:50:50  fatal error: An error occurred (Unauthorized) when calling the ListObjectsV2 operation: Unauthorized

New hotfix incoming

dduportal commented 8 months ago

https://github.com/jenkins-infra/crawler/commit/e196bb7f25a92e2168f81f1130653d61f0c057b6 (watching build on trusted)

dduportal commented 8 months ago

Finished: SUCCESS we can finish 2023 in this state \o/

NotMyFault commented 8 months ago

Thanks for checking 🎄

dduportal commented 8 months ago

As I don't know what @lemeurherve did in #3414 except the linked terraform PRs, and because we're in holidays for the next week, I've commented out the azcopy command jenkins-infra/crawler@d0efde1 and a build is currently running (I'll watch it).

We'll work on this in January then

Nothing done in #3414 explains the HTTP/403. I've reopened #3818 and I'll check if I did not forget a firewall rule somewhere. That could also explain the aws s3 errors after the first hotfix.

dduportal commented 8 months ago

Resuming analysis on this topic.

dduportal commented 8 months ago

Retried and got the following detailed error message for the azcopy operation:

Time:2024-01-02T13:25:44.6252585Z</Message><AuthenticationErrorDetail>Signature not valid in the specified time frame: Start [Fri, 06 Oct 2023 00:00:00 GMT] - Expiry [Fri, 22 Dec 2023 00:00:00 GMT] - Current [Tue, 02 Jan 2024 13:25:44 GMT]</AuthenticationErrorDetail></Error>
dduportal commented 8 months ago

https://github.com/jenkins-infra/azure/blob/240beb8f6f96fdf3bb114f66091120a971938481/updates.jenkins.io.tf#L44 => gotcha.

dduportal commented 8 months ago

https://github.com/jenkins-infra/azure/pull/565 to rotate expiry

dduportal commented 8 months ago

Update:

WiP:

dduportal commented 8 months ago

Update:

WiP:

dduportal commented 8 months ago

WiP: Check error for the aws s3 sync and fix it if still present

dduportal commented 8 months ago

Gotcha: all API tokens have reached their TTL in Cloudflare. It also breaks the jenkins-infra/cloudflare project since last month (at least).

See https://github.com/jenkins-infra/helpdesk/issues/2649#issuecomment-1874328160

dduportal commented 8 months ago

Update:

dduportal commented 8 months ago

Update:

Todo:

dduportal commented 8 months ago

Job still failing: https://github.com/jenkins-infra/helpdesk/issues/2649#issuecomment-1879629619

dduportal commented 8 months ago

Last steps: https://github.com/jenkins-infra/helpdesk/issues/2649#issuecomment-1879711494

Last build of crawler built without any problem