sigstore / cosign

Code signing and transparency for containers and binaries
Apache License 2.0
4.48k stars 547 forks source link

Cosign should retry Fulcio and Rekor RPCs #2198

Open patflynn opened 2 years ago

patflynn commented 2 years ago

Description

@vaikas mentioned to me that cosign commands occasionally fail due to an unexpected error either from the network or Sigstore backends. These errors typically will come at the very end of a CI job. To avoid failing expensive long running jobs with a transient cosign error we should retry failed Sigstore calls.

We should do this in all Sigstore clients.

Ideally this would be implemented with configurable exponential backoff and timeout.

Version all versions

var-sdk commented 2 years ago

For anyone picking this up (especially "good first issue" folk) loop in fulcio and rekor people for a review to make sure cosign only retries for retriable errors. Retrying errors that are not retriable (e.g. 403s are not retriable, 429s are retriable if you respect the "Retry-After" header.)

asraa commented 2 years ago

Reopening for the Fulcio one