moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.09k stars 1.14k forks source link

ADD of remote file in Dockerfile causes occasional "digest mismatch" #5020

Open dnephin opened 3 months ago

dnephin commented 3 months ago

Buildkit version: v0.12.3

From a Dockerfile with:

ADD https://host/file.crt

Occasionally we see build failures that look like this:

#9 https://host/file.crt
#9 ERROR: digest mismatch sha256:b155ec88e83ea2a01ff5a72ff270e3bbcb500ecf09a1ef1da80b47478a563bb1: 

The failures are intermittent, but I'm not sure if that's because we round robin between 4 build nodes, or if there is some other problem.

The relevant code seems to be: https://github.com/moby/buildkit/blob/4d9a4e5df9e11596a3261c1952cde3c6346be762/source/http/source.go#L449-L451

I notice that hs.cacheKey is an empty string, which seems strange. I guess it's expected that hs.cacheKey has a non-zero value here?

Anything else I can provide to help debug this problem?

tonistiigi commented 3 months ago

I guess it could be hitting this case https://github.com/moby/buildkit/blob/v0.14.0/source/http/source.go#L424 unexpectedly. Maybe because the ref has been deleted before Snapshot gets called (or maybe it was already deleted and metadata DB has stale data). Maybe there are some GC logs that confirm that?

It should be that CacheKey can return digest directly if user set one manually (pin), or data can be pulled and digest calculated or digest from previously downloaded ref can be reused if the etag confirms that it has not changed since the last pull. In the latter two cases the data is already pulled and should be accessed by refID. If hs.cacheKey is empty that should mean that it expected early return in L424.

To fix this case, need to make sure ref exists in all possible returns for CacheKey. Eg. it does in https://github.com/moby/buildkit/blob/v0.14.0/source/http/source.go#L283-L291 but not sure if guaranteed in https://github.com/moby/buildkit/blob/v0.14.0/source/http/source.go#L273-L280 and https://github.com/moby/buildkit/blob/v0.14.0/source/http/source.go#L235-L240, and then make sure there is no way it can get released between CacheKey and Snapshot.