Open dnephin opened 3 months ago
I guess it could be hitting this case https://github.com/moby/buildkit/blob/v0.14.0/source/http/source.go#L424 unexpectedly. Maybe because the ref has been deleted before Snapshot
gets called (or maybe it was already deleted and metadata DB has stale data). Maybe there are some GC logs that confirm that?
It should be that CacheKey
can return digest directly if user set one manually (pin), or data can be pulled and digest calculated or digest from previously downloaded ref can be reused if the etag confirms that it has not changed since the last pull. In the latter two cases the data is already pulled and should be accessed by refID
. If hs.cacheKey
is empty that should mean that it expected early return in L424.
To fix this case, need to make sure ref exists in all possible returns for CacheKey
. Eg. it does in https://github.com/moby/buildkit/blob/v0.14.0/source/http/source.go#L283-L291 but not sure if guaranteed in https://github.com/moby/buildkit/blob/v0.14.0/source/http/source.go#L273-L280 and https://github.com/moby/buildkit/blob/v0.14.0/source/http/source.go#L235-L240, and then make sure there is no way it can get released between CacheKey
and Snapshot
.
Buildkit version: v0.12.3
From a Dockerfile with:
Occasionally we see build failures that look like this:
The failures are intermittent, but I'm not sure if that's because we round robin between 4 build nodes, or if there is some other problem.
The relevant code seems to be: https://github.com/moby/buildkit/blob/4d9a4e5df9e11596a3261c1952cde3c6346be762/source/http/source.go#L449-L451
I notice that
hs.cacheKey
is an empty string, which seems strange. I guess it's expected thaths.cacheKey
has a non-zero value here?Anything else I can provide to help debug this problem?