osbuild / osbuild

Build-Pipelines for Operating System Artifacts
https://www.osbuild.org
Apache License 2.0
203 stars 112 forks source link

Implement a retry mechanism for ostree pull? #892

Open croissanne opened 2 years ago

croissanne commented 2 years ago

libostree HTTP error from remote 7f232153-9f17-452a-a92e-16a9a60c028b for <....filez>: Server returned HTTP 500 libostree HTTP error from remote b15f86a4-0913-4dc4-aa36-60a972772077 for <....filez>: Server returned HTTP 500 libostree HTTP error from remote 5ae4323a-92a2-4d39-9983-142444f52775 for <....filez>: Server returned HTTP 500 osbuild.host.RemoteError: CalledProcessError: Command '['ostree', 'pull', 'dcecf880-f512-499a-a6eb-e4f1e4280429', '065cf2caa6b3c50089d810dc9d3aabf9c958a46308028a8c37c09e3ad9a0ae1c', '--repo=/var/cache/osbuild-worker/osbuild-store/sources/org.osbuild.ostree/repo']' returned non-zero exit status 1. libostree HTTP error from remote dcecf880-f512-499a-a6eb-e4f1e4280429 for <....filez>: Server returned HTTP 500

Some errors we're seeing in the service when putting the object in aws s3. But unsure if we want retry logic in osbuild, or if it's better that alternative storage solutions are tried.

teg commented 2 years ago

Discussed this with @gicmo and @achilleas-k, these are some findings:

@cgwalters have we missed anything here? Any thoughts on how to mitigate the problem?

gicmo commented 2 years ago

I think we would need to translate 5xx in _ostree_fetcher_http_status_code_to_io_error to something we can match in _ostree_fetcher_should_retry_request.

cgwalters commented 2 years ago

Previously https://github.com/ostreedev/ostree/issues/2022

lavocatt commented 1 year ago

I don't see any related change in stages/org.osbuild.ostree.pull @croissanne is it still something to fix ?

croissanne commented 1 year ago

Yes, but ideally we could offload this to ostree itself. https://github.com/ostreedev/ostree/pull/1847 existed, but is closed now. I want to avoid having to retry the entire stage/pull, only fetching the individual files should be retried.

lavocatt commented 1 year ago

I'm gonna write a proposal for a fix and we can discuss it in the PR when it's there.