moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
7.85k stars 1.09k forks source link

flightcontrol: protect contention timeouts #5010

Closed tonistiigi closed 3 weeks ago

tonistiigi commented 3 weeks ago

We had a report of https://github.com/moby/buildkit/issues/1822 error showing up in some logs.

Possibly important aspect of this race seems to be that callbacks returning errors are handled differently and retry happens right away when previous callback has errored in the beginning of wait(). This was not handled by previous contention test. With the new test it is easy to reach timeout with ~100 goroutines (and smaller with reduced probability).

This patch reduces the backoff factor so it does not increase too quickly and adds a random factor to initial timeout so that when lots of goroutines hit the error at the same time, they do not all retry at the same time as well. This seems to dramatically reduce the maximum backoff that can be reached by generating contention with the new testcase.