moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.19k stars 1.16k forks source link

random error failed to get dead record #3959

Open coryb opened 1 year ago

coryb commented 1 year ago

It seems like this error has been happening for a while: https://github.com/moby/buildkit/issues/1468#issuecomment-625292498

I am seeing it on master now randomly via some load testing, very hard to reproduce, seems related to a race condition somewhere.

The error is coming from this line: https://github.com/moby/buildkit/blob/a45297b3411c90ed63b800491635297fdda09080/cache/manager.go#L403

Full stack here:

failed to load ref: failed to get dead record s3i2hx4q7gxakxxu7xres4wyt: not found
49583 v0.0.0+unknown buildkitd
github.com/moby/buildkit/cache.init
        /src/cache/manager.go:35
runtime.doInit
        /usr/local/src/runtime/proc.go:6507
runtime.doInit
        /usr/local/src/runtime/proc.go:6484
runtime.doInit
        /usr/local/src/runtime/proc.go:6484
runtime.doInit
        /usr/local/src/runtime/proc.go:6484
runtime.doInit
        /usr/local/src/runtime/proc.go:6484
runtime.main
        /usr/local/src/runtime/proc.go:233
runtime.goexit
        /usr/local/src/runtime/asm_amd64.s:1598

49583 v0.0.0+unknown buildkitd
github.com/moby/buildkit/cache.(*cacheManager).getRecord
        /src/cache/manager.go:403
github.com/moby/buildkit/cache.(*cacheManager).get
        /src/cache/manager.go:350
github.com/moby/buildkit/cache.(*cacheManager).Get
        /src/cache/manager.go:345
github.com/moby/buildkit/worker/base.(*Worker).LoadRef
        /src/worker/base/worker.go:274
github.com/moby/buildkit/worker.(*cacheResultStorage).LoadRemotes
        /src/worker/cacheresult.go:75
github.com/moby/buildkit/solver.(*exporter).ExportTo
        /src/solver/exporter.go:119
github.com/moby/buildkit/solver.(*mergedExporter).ExportTo
        /src/solver/exporter.go:243
github.com/moby/buildkit/solver/llbsolver.NewProvenanceCreator.func1
        /src/solver/llbsolver/provenance.go:450
github.com/moby/buildkit/solver/llbsolver.(*ProvenanceCreator).Predicate
        /src/solver/llbsolver/provenance.go:495
github.com/moby/buildkit/solver/llbsolver.(*Solver).recordBuildHistory.func1.1
        /src/solver/llbsolver/solver.go:205
github.com/moby/buildkit/solver/llbsolver.(*Solver).recordBuildHistory.func1.2
        /src/solver/llbsolver/solver.go:243
golang.org/x/sync/errgroup.(*Group).Go.func1
        /src/vendor/golang.org/x/sync/errgroup/errgroup.go:75
runtime.goexit
        /usr/local/src/runtime/asm_amd64.s:1598

49583 v0.0.0+unknown buildkitd
github.com/moby/buildkit/worker/base.(*Worker).LoadRef
        /src/worker/base/worker.go:295
github.com/moby/buildkit/worker.(*cacheResultStorage).LoadRemotes
        /src/worker/cacheresult.go:75
github.com/moby/buildkit/solver.(*exporter).ExportTo
        /src/solver/exporter.go:119
github.com/moby/buildkit/solver.(*mergedExporter).ExportTo
        /src/solver/exporter.go:243
github.com/moby/buildkit/solver/llbsolver.NewProvenanceCreator.func1
        /src/solver/llbsolver/provenance.go:450
github.com/moby/buildkit/solver/llbsolver.(*ProvenanceCreator).Predicate
        /src/solver/llbsolver/provenance.go:495
github.com/moby/buildkit/solver/llbsolver.(*Solver).recordBuildHistory.func1.1
        /src/solver/llbsolver/solver.go:205
github.com/moby/buildkit/solver/llbsolver.(*Solver).recordBuildHistory.func1.2
        /src/solver/llbsolver/solver.go:243
golang.org/x/sync/errgroup.(*Group).Go.func1
        /src/vendor/golang.org/x/sync/errgroup/errgroup.go:75
runtime.goexit
        /usr/local/src/runtime/asm_amd64.s:1598

49583 v0.0.0+unknown buildkitd
main.unaryInterceptor.func1
        /src/cmd/buildkitd/main.go:607
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
        /src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
        /src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
github.com/moby/buildkit/api/services/control._Control_Solve_Handler
        /src/api/services/control/control.pb.go:2440
google.golang.org/grpc.(*Server).processUnaryRPC
        /src/vendor/google.golang.org/grpc/server.go:1336
google.golang.org/grpc.(*Server).handleStream
        /src/vendor/google.golang.org/grpc/server.go:1704
google.golang.org/grpc.(*Server).serveStreams.func1.2
        /src/vendor/google.golang.org/grpc/server.go:965
runtime.goexit
        /usr/local/src/runtime/asm_amd64.s:1598
coryb commented 1 year ago

Here are some log lines related to the record ID, does not seem to be related to snapshot GC, but hard to tell from this:

{"level":"error","msg":"/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = failed to load ref: failed to get dead record s3i2hx4q7gxakxxu7xres4wyt: not found","time":"2023-06-21T06:05:39.069Z"}

{"key":"s3i2hx4q7gxakxxu7xres4wyt-view","level":"debug","msg":"remove snapshot","snapshotter":"overlayfs","time":"2023-06-21T06:06:39.951Z"}

{"key":"buildkit/504474/s3i2hx4q7gxakxxu7xres4wyt-view","level":"debug","msg":"removed snapshot","snapshotter":"overlayfs","time":"2023-06-21T06:06:40.567Z"}

{"key":"s3i2hx4q7gxakxxu7xres4wyt","level":"debug","msg":"remove snapshot","snapshotter":"overlayfs","time":"2023-06-21T07:25:54.1Z"}

{"key":"buildkit/504472/s3i2hx4q7gxakxxu7xres4wyt","level":"debug","msg":"removed snapshot","snapshotter":"overlayfs","time":"2023-06-21T07:25:54.725Z"}

We are running buildkit with the oci worker and gc enabled. We set --oci-worker-gc-keepstorage to 70% of the available disk for our usage. Config look like:

[worker.oci]
  enabled = true
  gc = true

[worker.containerd]
  enabled = false