Open coryb opened 1 year ago
Here are some log lines related to the record ID, does not seem to be related to snapshot GC, but hard to tell from this:
{"level":"error","msg":"/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = failed to load ref: failed to get dead record s3i2hx4q7gxakxxu7xres4wyt: not found","time":"2023-06-21T06:05:39.069Z"}
{"key":"s3i2hx4q7gxakxxu7xres4wyt-view","level":"debug","msg":"remove snapshot","snapshotter":"overlayfs","time":"2023-06-21T06:06:39.951Z"}
{"key":"buildkit/504474/s3i2hx4q7gxakxxu7xres4wyt-view","level":"debug","msg":"removed snapshot","snapshotter":"overlayfs","time":"2023-06-21T06:06:40.567Z"}
{"key":"s3i2hx4q7gxakxxu7xres4wyt","level":"debug","msg":"remove snapshot","snapshotter":"overlayfs","time":"2023-06-21T07:25:54.1Z"}
{"key":"buildkit/504472/s3i2hx4q7gxakxxu7xres4wyt","level":"debug","msg":"removed snapshot","snapshotter":"overlayfs","time":"2023-06-21T07:25:54.725Z"}
We are running buildkit with the oci worker and gc enabled. We set --oci-worker-gc-keepstorage to 70% of the available disk for our usage. Config look like:
[worker.oci]
enabled = true
gc = true
[worker.containerd]
enabled = false
It seems like this error has been happening for a while: https://github.com/moby/buildkit/issues/1468#issuecomment-625292498
I am seeing it on master now randomly via some load testing, very hard to reproduce, seems related to a race condition somewhere.
The error is coming from this line: https://github.com/moby/buildkit/blob/a45297b3411c90ed63b800491635297fdda09080/cache/manager.go#L403
Full stack here: