yannh / kubeconform

A FAST Kubernetes manifests validator, with support for Custom Resources!
Apache License 2.0
2.27k stars 123 forks source link

fix: retry on download errors #274

Closed carlossg closed 4 months ago

carlossg commented 5 months ago

to avoid failing because of issues like connection reset

yannh commented 5 months ago

Good idea, though this never gives up and doesnt back off incrementallly.. Any library we might be able to use maybe? :thinking:

carlossg commented 5 months ago

It just tries once, one second later, but I've used now hashicorp/go-retryablehttp. If this looks good the tests need to use a real local server to test the retries

carlossg commented 4 months ago

@yannh added a test with a real http server to test the retries

test output

2024/07/26 13:34:55 [DEBUG] GET http://localhost:9163/simulate-reset
2024/07/26 13:34:55 [ERR] GET http://localhost:9163/simulate-reset request failed: Get "http://localhost:9163/simulate-reset": EOF
2024/07/26 13:34:55 [DEBUG] GET http://localhost:9163/simulate-reset: retrying in 1s (2 left)
2024/07/26 13:34:56 using schema found at http://localhost:9163/simulate-reset
2024/07/26 13:34:56 [DEBUG] GET http://localhost:9163/404
2024/07/26 13:34:56 could not find schema at http://localhost:9163/404
2024/07/26 13:34:56 [DEBUG] GET http://localhost:9163/500
2024/07/26 13:34:56 [DEBUG] GET http://localhost:9163/500 (status: 500): retrying in 1s (2 left)
2024/07/26 13:34:57 [DEBUG] GET http://localhost:9163/500 (status: 500): retrying in 2s (1 left)
2024/07/26 13:34:59 failed downloading schema at http://localhost:9163/500: Get "http://localhost:9163/500": GET http://localhost:9163/500 giving up after 3 attempt(s)
2024/07/26 13:34:59 [DEBUG] GET http://localhost:9163/503
2024/07/26 13:34:59 [DEBUG] GET http://localhost:9163/503 (status: 503): retrying in 1s (2 left)
2024/07/26 13:35:00 using schema found at http://localhost:9163/503
2024/07/26 13:35:00 [DEBUG] GET http://localhost:9163
2024/07/26 13:35:00 using schema found at http://localhost:9163
yannh commented 4 months ago

Hi @carlossg , thanks a lot for the contribution. Your tests are quite a bit more complex than what we had before, but they look good. I iterated on them a little bit, can you let me know what you think? Thanks!

carlossg commented 4 months ago

much better thanks

yannh commented 4 months ago

Acceptance tests are failing, not sure why - they run locally just fine. Unsure if related to this PR :thinking: it ll take me some time to troubleshoot.

yannh commented 4 months ago

I reverted it, it seems to print debug log in some cases. I ll review this later today/this week :bow: