moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.09k stars 1.14k forks source link

panic: failed to get edge #2303

Open jamescook opened 3 years ago

jamescook commented 3 years ago

I'm not sure which buildctl command triggered the panic. We are building a few different images in parallel. The general command format is:

buildctl build --progress=plain --frontend=dockerfile.v0 --local context="." --local dockerfile="." --opt target="production" --opt build-arg:GIT_BRANCH="$BUILDKITE_BRANCH" --opt build-arg:GIT_COMMIT_SHA="$BUILDKITE_COMMIT" --opt build-arg:GIT_COMMIT_MESSAGE="$BUILDKITE_MESSAGE" --import-cache ref=zzzzz/cache:production-zzzz-master,type=registry --import-cache ref=zzzz/cache:production-zzzz-master,type=registry --export-cache ref=zzzz/cache:production-zzzz-master,mode=max,type=registry --output \"name=zzzz/zzzz:production-bc6e10840eb7d7652c2cf120199d15e3623f65ba-amd64\",\"push=true\",\"type=image\"

Version:

buildkitd --version
buildkitd github.com/moby/buildkit v0.9.0 c8bb937807d405d92be91f06ce2629e6202ac7a9

Config:

cat /etc/buildkit/buildkitd.toml 
[worker.oci]
  max-parallelism = 3

[worker.containerd]
  max-parallelism = 3

Error:

panic: failed to get edge

goroutine 81 [running]:
github.com/moby/buildkit/solver.(*pipeFactory).NewInputRequest(0x4004a17f00, 0x0, 0x1095e98, 0x4127e9ce40, 0x4081d9c180, 0x41259347a8, 0x1)
        /src/solver/scheduler.go:354 +0x1e0
github.com/moby/buildkit/solver.(*edge).createInputRequests(0x4115bec280, 0x2, 0x4004a17f00, 0x0, 0x18c91d0)
        /src/solver/edge.go:809 +0x280
github.com/moby/buildkit/solver.(*edge).unpark(0x4115bec280, 0x411272a020, 0x1, 0x1, 0x4004a17e30, 0x0, 0x0, 0x18c91d0, 0x0, 0x0, ...)
        /src/solver/edge.go:360 +0x12c
github.com/moby/buildkit/solver.(*scheduler).dispatch(0x400024f030, 0x4115bec280)
        /src/solver/scheduler.go:136 +0x3a0
github.com/moby/buildkit/solver.(*scheduler).loop(0x400024f030)
        /src/solver/scheduler.go:104 +0x1ac
created by github.com/moby/buildkit/solver.newScheduler
        /src/solver/scheduler.go:35 +0x1a0
jamesalucas commented 3 years ago

Hello,

I am experiencing this same panic when using buildkit via Earthly where it is causing a lot of instability in our CI builds which run 10 jobs concurrently.

In order to try and identify what the issue is I have built buildkitd locally and have been experimenting with different log messages in our CI environment. So far I have established that the error happens because...

NewInputRequest makes a call to getEdge(Edge) and within getEdge it checks to see if the digest for the edge is active in the following bit of code:

func (jl *Solver) getEdge(e Edge) *edge {
    jl.mu.RLock()
    defer jl.mu.RUnlock()

    st, ok := jl.actives[e.Vertex.Digest()]
    if !ok {
        return nil
    }
    return st.getEdge(e.Index)
}

In the error scenario it is failing to find an entry in jl.actives for the edge's digest, returns nil which then causes the panic in NewInputRequest.

I have removed the panic and whilst the jobs fail when this (race condition?) happens, the buildkitd process doesn't die and therefore it's only a small fraction of jobs that error rather than all of the jobs that were in progress at the time. This is a vast improvement for us, but it would be great to determine why there isn't an entry in jl.actives that is expected.

Therefore I am currently looking to see if entries in jl.actives are being removed prematurely by logging each one that is deleted (in deleteIfUnreferenced), but even if I were to see that happening, I wouldn't have a clue how to resolve it!

If anyone has any suggestions of things to try then I'd love to hear them!

Thanks,

James

jamesalucas commented 3 years ago

I think I have managed to confirm that the entries in jl.actives are being removed prematurely as I can see the following from my logging:

grep 2c4ac4dfeeeb84e66ab8348046fda65f600b93195ee19b87de914268c08a1058 711f69ac3cde_2.log
time="2021-09-29T12:39:20Z" level=info msg="> Deleting digest sha256:2c4ac4dfeeeb84e66ab8348046fda65f600b93195ee19b87de914268c08a1058 "
time="2021-09-29T12:40:08Z" level=info msg=">>> getEdge(Edge) edge sha256:2c4ac4dfeeeb84e66ab8348046fda65f600b93195ee19b87de914268c08a1058 [./packages/nest+yarn-files 17a2fd41-6ca8-42b3-8d05-166d45042252] COPY package.json ./packages/nest/ %!s(solver.Index=0). No active vertex. There are 548 active vertexes"
time="2021-09-29T12:40:10Z" level=info msg="> failed to get edge sha256:2c4ac4dfeeeb84e66ab8348046fda65f600b93195ee19b87de914268c08a1058 [./packages/nest+yarn-files 17a2fd41-6ca8-42b3-8d05-166d45042252] COPY package.json ./packages/nest/ currentState={%!s(solver.edgeStatusType=2) %!s(*solver.SharedCachedResult=<nil>) %!s(*solver.CacheMap=&{sha256:5da32dcaee25519f6a4ec51b2fa169515837349a513709edd66186fddcee87c3 [{ <nil> 0xe41e00} {sha256:545332ac15012117358e30c0f45950b6426a9be87463f2e0c82e615033e511b8 0xe49260 0xe41e00}] map[]}) [{%!s(*solver.CacheKey=&{{{0 0} 0 0 0 0} n6m72p99voqixllzyt5n6kujo [[{ {0xc001b7e480 0xc011e24870}}] [{sha256:545332ac15012117358e30c0f45950b6426a9be87463f2e0c82e615033e511b8 {0xc001b7e880 0xc00d30c8d0}} { {0xc00d1b6980 0xc007fbae60}}]] sha256:5da32dcaee25519f6a4ec51b2fa169515837349a513709edd66186fddcee87c3 0 map[0xc0032a3270:n6m72p99voqixllzyt5n6kujo] [ukfwfafw90v4t3wvocx6j5abc]}) %!s(*solver.exporter=&{0xc00d1b6b00 [0xc0260305f0] <nil> [] <nil> <nil>})}]} desiredState=complete"

So the digest that eventually fails is deleted just before it is used.

Can anyone suggest the best approach to fixing this? If I can, I'd like to try.

Thanks!

tonistiigi commented 3 years ago

@jamesalucas We would need to understand what triggers the Deleting digest and supposedly why is the internal reference count not correct as it still seems to be used.

tonistiigi commented 3 years ago

@jamesalucas @maxlaverse

Is any of the following likely in your case: 1) parallel requests of same vertexes when some builds use --no-cache and some don't 2) your build reach this "merging edge" line https://github.com/moby/buildkit/blob/e839ff729b2d071b9f45ff4e1a63535dd539df75/solver/scheduler.go#L176 before the condition happens

maxlaverse commented 3 years ago

We have a lot of concurrent builds which:

I'll check if this line you mentioned appears in the logs.

tonistiigi commented 3 years ago

@maxlaverse but no --no-cache? The reason I'm asking is that these 2 features mess with edge digests. --no-cache will modify the digest so that it doesn't collide with the existing cached vertex. And merging takes 2 digests and merges them into one when 2 different LLBs generate the same cache key.

maxlaverse commented 3 years ago

We run a few different sort of build, but the most common and for which the concurrency is the highest, the invocation looks like this:

buildctl build --progress=plain --frontend=dockerfile.v0 
   --local context="." \
   --local dockerfile="frontend/" \
   --opt target="test" \
   --import-cache ref=tenant/cache:test-app-branch,type=registry \
   --import-cache ref=tenant/cache:test-app-master,type=registry \
   --export-cache ref=tenant/cache:test-app-branch,mode=max,type=registry \
   --output \"name=tenant/app:test-commit\",\"push=true\",\"type=image\"
jamesalucas commented 3 years ago

@jamesalucas @maxlaverse

Is any of the following likely in your case:

  1. parallel requests of same vertexes when some builds use --no-cache and some don't
  2. your build reach this "merging edge" line https://github.com/moby/buildkit/blob/e839ff729b2d071b9f45ff4e1a63535dd539df75/solver/scheduler.go#L176 before the condition happens

We are using buildkit via Earthly but I assume Earthly's --no-cache argument is passed through to buildkit and we are not using it in our CI environment where this happens.

However, I have changed the merging edge debug message to print out the digest and I can see the edge digest that is deleted and subsequently errors is merged to a different edge digest just before:

grep c6cab7679d76597d6232b846997f33d791e159bbc09a875ed4913693a774b5e7 *
time="2021-09-29T21:55:34Z" level=info msg="merging edge sha256:c6cab7679d76597d6232b846997f33d791e159bbc09a875ed4913693a774b5e7 to sha256:7e2cd08add2ecf2b6672f121fe0a9fbd6639fe055591e09008795baa9926aeba\n"
time="2021-09-29T21:56:04Z" level=info msg="> Deleting digest sha256:c6cab7679d76597d6232b846997f33d791e159bbc09a875ed4913693a774b5e7 "
time="2021-09-29T21:56:12Z" level=info msg=">>> getEdge(Edge) edge sha256:c6cab7679d76597d6232b846997f33d791e159bbc09a875ed4913693a774b5e7 [+yarn-files 2ee02759-1268-4d3f-8f14-978a463dff00] SAVE ARTIFACT ./package.json +yarn-files/package.json %!s(solver.Index=0). No active vertex.%!(EXTRA int=1063)"
time="2021-09-29T21:56:12Z" level=info msg="> failed to get edge sha256:c6cab7679d76597d6232b846997f33d791e159bbc09a875ed4913693a774b5e7 [+yarn-files 2ee02759-1268-4d3f-8f14-978a463dff00] SAVE ARTIFACT ./package.json +yarn-files/package.json currentState={%!s(solver.edgeStatusType=2) %!s(*solver.SharedCachedResult=<nil>) %!s(*solver.CacheMap=&{sha256:112ae4533f0019a3e095d76b482b0beb209e7cc87f132cae6b7acd3c61e75096 [{ <nil> 0xe41dc0} {sha256:ba47d017b8707c1ce4bfd6a54c7841c67df1eca9b4fcdb53bf15aaddc7733d82 0xe49220 0xe41dc0}] map[]}) [{%!s(*solver.CacheKey=&{{{0 0} 0 0 0 0} iifmbpr0zpw8bgaranjftebnb [[{ {0xc001321880 0xc00081a0a0}}] [{sha256:ba47d017b8707c1ce4bfd6a54c7841c67df1eca9b4fcdb53bf15aaddc7733d82 {0xc00f245a80 0xc001ef6eb0}}]] sha256:112ae4533f0019a3e095d76b482b0beb209e7cc87f132cae6b7acd3c61e75096 0 map[0xc000e1adc0:iifmbpr0zpw8bgaranjftebnb] [pxnzy38nc5xmjrg71s8v7uvx4]}) %!s(*solver.exporter=&{0xc0042ba380 [0xc00081b3b0] <nil> [] <nil> <nil>})}]} desiredState=complete"
jamesalucas commented 2 years ago

I've added a bit more logging and I can see that there are 2 scenarios in which I am seeing errors:

1) A source edge is merged to a target edge, the source edge is deleted and then a getEdge is requested for the source edge 2) A source edge is merged to a target edge, the target edge is deleted and then a getEdge is requested for the target edge

Example output for case 1:

time="2021-10-05T15:54:04Z" level=info msg="merging edge sha256:dfdfdefe6ce1e0ee4e436270dfe6a52f0cf453adc7ce328912f4200016caa21f ([+yarn-files 7e8cbea8-3f05-4d31-92a5-7afb01151126] SAVE ARTIFACT ./package.json +yarn-files/package.json) to sha256:7e040d3815cd685e800bf57e7320e88252a65574a69454212278dd62a91b879f ([+yarn-files 7e8cbea8-3f05-4d31-92a5-7afb01151126] SAVE ARTIFACT ./package.json +yarn-files/package.json)\n"
time="2021-10-05T15:54:25Z" level=info msg="> Deleting digest sha256:dfdfdefe6ce1e0ee4e436270dfe6a52f0cf453adc7ce328912f4200016caa21f "
time="2021-10-05T15:55:42Z" level=info msg=">>> getEdge(Edge) edge sha256:dfdfdefe6ce1e0ee4e436270dfe6a52f0cf453adc7ce328912f4200016caa21f [+yarn-files 7e8cbea8-3f05-4d31-92a5-7afb01151126] SAVE ARTIFACT ./package.json +yarn-files/package.json %!s(solver.Index=0) 744. No active vertex."
time="2021-10-05T15:55:42Z" level=info msg="> failed to get edge sha256:dfdfdefe6ce1e0ee4e436270dfe6a52f0cf453adc7ce328912f4200016caa21f [+yarn-files 7e8cbea8-3f05-4d31-92a5-7afb01151126] SAVE ARTIFACT ./package.json +yarn-files/package.json currentState={%!s(solver.edgeStatusType=2) %!s(*solver.SharedCachedResult=<nil>) %!s(*solver.CacheMap=&{sha256:112ae4533f0019a3e095d76b482b0beb209e7cc87f132cae6b7acd3c61e75096 [{ <nil> 0xe429e0} {sha256:ba47d017b8707c1ce4bfd6a54c7841c67df1eca9b4fcdb53bf15aaddc7733d82 0xe49e40 0xe429e0}] map[]}) [{%!s(*solver.CacheKey=&{{{0 0} 0 0 0 0} xaj1z3yuc70xdr3y2qbcxyi5b [[{ {0xc006cd8800 0xc006b074a0}}] [{sha256:ba47d017b8707c1ce4bfd6a54c7841c67df1eca9b4fcdb53bf15aaddc7733d82 {0xc0068d0700 0xc000dfb4a0}}]] sha256:112ae4533f0019a3e095d76b482b0beb209e7cc87f132cae6b7acd3c61e75096 0 map[0xc007a486e0:xaj1z3yuc70xdr3y2qbcxyi5b] [l1vyvlwir5al1jspftup60v0y]}) %!s(*solver.exporter=&{0xc006cea500 [0xc006b079a0] <nil> [] <nil> <nil>})}]} desiredState=complete"

Example output for case 2:

time="2021-10-05T15:54:09Z" level=info msg="merging edge sha256:ba599a58a0e193804b9e28527e03c12c9dfbccd9881b4e22b269ddb54b4e3d84 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 28d1d4a9-4e00-4e66-8e00-bca09d6d16e2] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 28d1d4a9-4e00-4e66-8e00-bca09d6d16e2] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:54:09Z" level=info msg="merging edge sha256:f08910304d5d2c2c5132b49a8c291b8183d866411a124e3891f8e377bcaf275e ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 75178605-270c-4fcb-9fc8-ad2ebc911e3f] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 75178605-270c-4fcb-9fc8-ad2ebc911e3f] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:54:10Z" level=info msg="> Deleting digest sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba "
time="2021-10-05T15:54:41Z" level=info msg="merging edge sha256:4895dacc32ecd7c4154edabac471094d2bb5a50ab185e01b58183e394330dcf0 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 01d39e5d-9781-48b5-8a14-5cf959f6981c] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 01d39e5d-9781-48b5-8a14-5cf959f6981c] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:55:18Z" level=info msg="merging edge sha256:da5d534ed33f1ca7719441b18288f823ba1113509a0b5df8f84dbef8b8b56020 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) ca56e7ea-6bcc-4da6-ac13-2a322aecb85c] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) ca56e7ea-6bcc-4da6-ac13-2a322aecb85c] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:55:18Z" level=info msg="merging edge sha256:74c7c5ead5845ef500463e5ce760a79eaf1121b19dd145dfeee415f89a14582a ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) f50f1d88-239e-4858-aafc-3b9c6584f7c2] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) f50f1d88-239e-4858-aafc-3b9c6584f7c2] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:55:18Z" level=info msg="merging edge sha256:9e6070d6b1b69c173ff37d0e5452eacd0154cffef2aab04b8266d317205486b1 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) b4376bc8-00ad-4ad8-aaa7-8eafe02013a3] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) b4376bc8-00ad-4ad8-aaa7-8eafe02013a3] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:55:18Z" level=info msg="merging edge sha256:8c2a416f3ccd69c1f975421590dd0c3031c72b82c2b70e856f7248824865cf66 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) b85d93cd-705b-4eaa-9056-285299d42aa3] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) b85d93cd-705b-4eaa-9056-285299d42aa3] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:55:18Z" level=info msg="merging edge sha256:0fcf76c2cc739ac7bb4d56b4f4900748b24a2886dec1d885178a425501bfa09f ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 34f1a0c2-abeb-4b0e-a64e-94c012f67e3b] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 34f1a0c2-abeb-4b0e-a64e-94c012f67e3b] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:55:18Z" level=info msg="merging edge sha256:9258208b4e841af1455e02593b663a0f0f57d78456f0b94ea8db4825ac1114e6 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) cd4df4ef-88bc-42b1-ad45-8cf92ac8260e] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) cd4df4ef-88bc-42b1-ad45-8cf92ac8260e] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:55:18Z" level=info msg="merging edge sha256:5ec70d35ea9c011b40eb55592f668627fc739ee0b7ecf8df86b71d74a8b8d015 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 44b4afa9-df4a-4d26-b7c5-a16b75aa821f] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 44b4afa9-df4a-4d26-b7c5-a16b75aa821f] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:55:18Z" level=info msg=">>> getEdge(Edge) edge sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba [./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 683829b7-9dcc-415e-bcf7-35fe0649c0e8] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json %!s(solver.Index=0) 1658. No active vertex."
time="2021-10-05T15:55:18Z" level=info msg="> failed to get edge sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba [./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 683829b7-9dcc-415e-bcf7-35fe0649c0e8] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json currentState={%!s(solver.edgeStatusType=2) %!s(*solver.SharedCachedResult=<nil>) %!s(*solver.CacheMap=&{sha256:68d83221625a5a4e49c24cd4b334087efba88e40416440a493f350033cdd6422 [{ <nil> 0xe429e0} {sha256:bf4fa1ac17bdacdfdc0fb21e2149de114fbce7b88cda6b4773499adfcd9d30f1 0xe49e40 0xe429e0}] map[]}) [{%!s(*solver.CacheKey=&{{{0 0} 0 0 0 0} 8mqyeyauhf5zyheedq6f314wy [[{ {0xc003e61380 0xc006d4fe00}}] [{sha256:bf4fa1ac17bdacdfdc0fb21e2149de114fbce7b88cda6b4773499adfcd9d30f1 {0xc003e60700 0xc006d4f680}}]] sha256:68d83221625a5a4e49c24cd4b334087efba88e40416440a493f350033cdd6422 0 map[0xc0090574a0:8mqyeyauhf5zyheedq6f314wy] [9lbtfdi2rq0v819x5nijzj033]}) %!s(*solver.exporter=&{0xc003e64680 [0xc0047671d0] <nil> [] <nil> <nil>})}]} desiredState=complete"
time="2021-10-05T15:55:48Z" level=info msg="merging edge sha256:23a670f7d58ab0d025ce48e2d0a8d57ce1f461444a7436205048777867b3005c ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 852af9b0-23e5-4d56-bfbf-5713136bb5ca] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 852af9b0-23e5-4d56-bfbf-5713136bb5ca] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:55:48Z" level=info msg="merging edge sha256:bc7db25882ca087853c05a5c1771cf7eb1a22c29ea73db20151c38f061595b9f ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) defd0f20-11d2-4521-9190-615865d066da] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) defd0f20-11d2-4521-9190-615865d066da] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:56:30Z" level=info msg="merging edge sha256:23a670f7d58ab0d025ce48e2d0a8d57ce1f461444a7436205048777867b3005c ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 852af9b0-23e5-4d56-bfbf-5713136bb5ca] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 852af9b0-23e5-4d56-bfbf-5713136bb5ca] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:58:04Z" level=info msg="merging edge sha256:4b1f08c6693095b39ae0a3b4d96c63cb09f1408cf35431266f198608c3d4da39 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 1b78f028-d386-4bba-aad8-35a835bf0b77] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 1b78f028-d386-4bba-aad8-35a835bf0b77] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:58:04Z" level=info msg="merging edge sha256:a0c244e6eea8a692a6640ff4625106567a04bbf60d7d1b12a3420c8292317cb3 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 316b55a0-3396-4e7a-a755-e447d8ed9f3b] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 316b55a0-3396-4e7a-a755-e447d8ed9f3b] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:58:04Z" level=info msg="merging edge sha256:bdff2c3c86844d2cac606f347b47627f7cacc969064472c7d467ddfba747fb83 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 82175f06-8874-421a-83d8-77fef3b9ca61] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 82175f06-8874-421a-83d8-77fef3b9ca61] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:58:04Z" level=info msg="merging edge sha256:413928e84efd9a3f0f7ae0fe006ffb66f36f9c5ee61e0e33d33df0445345db61 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 6192e5b0-c02b-4ec6-9afb-ddd636319da5] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 6192e5b0-c02b-4ec6-9afb-ddd636319da5] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:58:04Z" level=info msg="merging edge sha256:7bcd4d0ea98ca0fb22d86785e54d47ec10980d82b01c374e5ef4d8ef97993f27 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) dbe05e3b-c2b6-4d04-955c-e8e14be53839] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) dbe05e3b-c2b6-4d04-955c-e8e14be53839] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:58:04Z" level=info msg="merging edge sha256:1ca436597f00b4d1684f5cd21e128c3c0116513ff6b2f4ecc83f8272ef54f56f ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 5be1b080-3fb0-46b0-aaca-da4e82c57582] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 5be1b080-3fb0-46b0-aaca-da4e82c57582] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:58:37Z" level=info msg="merging edge sha256:5a2bea55346c0f2ed8b3ace64961ea866e7ca92f411b3330c36987ee326e161a ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 16f4c3f5-9340-49ba-a480-a4c395303210] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 16f4c3f5-9340-49ba-a480-a4c395303210] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:59:25Z" level=info msg="merging edge sha256:18922709b200ce4e10449a1bfb4d2044a5184ba61dae138017e9815abf9e3172 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) ca40aaba-e744-4ed7-8307-476ba7934bbc] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) ca40aaba-e744-4ed7-8307-476ba7934bbc] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:59:50Z" level=info msg="merging edge sha256:19ef07e7c40ca329b8c79ca896db82224c54e1deb50695c9e7f5353f9a46d39c ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 270df134-ef0e-4902-bc02-013559b93a81] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 270df134-ef0e-4902-bc02-013559b93a81] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:59:50Z" level=info msg="merging edge sha256:5c68de6cc84f4dbff42aebf3c492a2a8519f1f9a8ad16431ee6add2e8c576057 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 71d97448-023a-43c3-ad26-5ac6ec9bcb55] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 71d97448-023a-43c3-ad26-5ac6ec9bcb55] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:59:51Z" level=info msg="merging edge sha256:18922709b200ce4e10449a1bfb4d2044a5184ba61dae138017e9815abf9e3172 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 41c89c62-f04b-44fb-b980-d219638aaa2e] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 41c89c62-f04b-44fb-b980-d219638aaa2e] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T15:59:51Z" level=info msg="merging edge sha256:19ef07e7c40ca329b8c79ca896db82224c54e1deb50695c9e7f5353f9a46d39c ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 270df134-ef0e-4902-bc02-013559b93a81] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 270df134-ef0e-4902-bc02-013559b93a81] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:00:25Z" level=info msg="merging edge sha256:de75248aa537acc744e805e06ff92742edb3800657b212bfd0569c2922117710 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 23af9b35-1841-4e13-8cea-32d39bb014a0] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 23af9b35-1841-4e13-8cea-32d39bb014a0] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:00:25Z" level=info msg="merging edge sha256:e7bb62ea6f5111f271ba81aa9340d4e0fbb7e7b0f5b17c6d933ef8a922b534e2 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) f872074b-87dc-4ff6-abf0-23c95e188126] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) f872074b-87dc-4ff6-abf0-23c95e188126] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:00:41Z" level=info msg="merging edge sha256:10c74413a99f201a252e8c2cdd26e88e81ef2b2bb6f1c81dd5a8e93afbc88da6 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) cf8c40e7-6430-44e9-ad3c-a7fab179e777] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) cf8c40e7-6430-44e9-ad3c-a7fab179e777] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:00:42Z" level=info msg="merging edge sha256:10c74413a99f201a252e8c2cdd26e88e81ef2b2bb6f1c81dd5a8e93afbc88da6 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 8e87f0a3-ae68-4477-a36d-b61160b25798] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 8e87f0a3-ae68-4477-a36d-b61160b25798] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:01:55Z" level=info msg="merging edge sha256:5a0e3ee8a682caf5bbe68c4e99d01400b86a04c39c42e95e79aa027ea7678aff ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) b0610034-6f83-453a-9dd6-9c5a4bc1b8cc] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) b0610034-6f83-453a-9dd6-9c5a4bc1b8cc] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:02:22Z" level=info msg="merging edge sha256:bbd11e947ae9cce7ca5e9baa9734bf5f81ddaa9f8cb8e776714bf657302dd6c8 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 31a02856-fba4-4759-90c0-2779b9589984] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 31a02856-fba4-4759-90c0-2779b9589984] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:02:40Z" level=info msg="merging edge sha256:4e27ac344cd475fda85155a5e5803901bb01ce8ca4789c8168ca25762e03dfde ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 562463f2-2565-401b-a0e4-35c2fb05938b] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 562463f2-2565-401b-a0e4-35c2fb05938b] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:02:42Z" level=info msg="merging edge sha256:4e27ac344cd475fda85155a5e5803901bb01ce8ca4789c8168ca25762e03dfde ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 562463f2-2565-401b-a0e4-35c2fb05938b] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 562463f2-2565-401b-a0e4-35c2fb05938b] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:06:59Z" level=info msg="merging edge sha256:f784f4e60c3bd34e6aac050f9ce2e53aa77323fcb9d7deb4765188a51a9a6691 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 3e96c47e-0507-4cce-8954-a4e24a68bd80] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 3e96c47e-0507-4cce-8954-a4e24a68bd80] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:06:59Z" level=info msg="merging edge sha256:0108dc333568ad44646a7ffe87a5a3767c54445c3426a4570df9c0b03ec88555 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 9ce90c86-a756-40c5-a97a-3050ca89d124] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 9ce90c86-a756-40c5-a97a-3050ca89d124] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:06:59Z" level=info msg="merging edge sha256:92733dcbba8f7a8d7de488033f370f2338b31e3dfaa4a75c09872f93682ffde4 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) d9c45476-7fd5-410b-ac59-a5d83893fd5d] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) d9c45476-7fd5-410b-ac59-a5d83893fd5d] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:08:48Z" level=info msg="merging edge sha256:7116e4e0d477de6bdc2140753b0a8ff339e4ead5a85e4af9df335beac6c73205 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 0f6e8490-8cde-4137-8e5e-670ae884e31e] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 0f6e8490-8cde-4137-8e5e-670ae884e31e] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:08:48Z" level=info msg="merging edge sha256:c88b695851d0ef39fc5d927e48559b71fd69755af5970217d24bda5fbafc54d4 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 0c841fee-7b12-49e2-84c8-0a553ac43371] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 0c841fee-7b12-49e2-84c8-0a553ac43371] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:08:49Z" level=info msg="merging edge sha256:7116e4e0d477de6bdc2140753b0a8ff339e4ead5a85e4af9df335beac6c73205 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 0f6e8490-8cde-4137-8e5e-670ae884e31e] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 0f6e8490-8cde-4137-8e5e-670ae884e31e] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"
time="2021-10-05T16:08:49Z" level=info msg="merging edge sha256:c88b695851d0ef39fc5d927e48559b71fd69755af5970217d24bda5fbafc54d4 ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 17727c07-2a42-4866-95b9-714443540f63] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json) to sha256:46ff3abab91cea0af60e7aaf08caa1c83c7613b7b37a970aedf892ede6d552ba ([./packages/nest-kafka+build(package=bmVzdC1rYWZrYQ==) 17727c07-2a42-4866-95b9-714443540f63] SAVE ARTIFACT tsconfig.json ./packages/nest-kafka+build/tsconfig.json)\n"

@tonistiigi are you able to suggest what area might need to change to fix this? I have had a couple of attempts but not certain how to go about addressing this so any hints would be appreciated!

tonistiigi commented 2 years ago

First can you confirm where getEdge fails. Looks to be in https://github.com/moby/buildkit/blob/b2ff444122ff769c13375a8b969cb299a42a9562/solver/jobs.go#L292 but better to be sure.

Then childVtx is used in deleteIfUnreferenced. This is only used on nested build so unlikely you use that code but just to be sure it is not in effect you can add a log in https://github.com/moby/buildkit/blob/b2ff444122ff769c13375a8b969cb299a42a9562/solver/jobs.go#L402

For the first case it looks weird as even when the edge is replaced it should not invalidate the digest in actives array. It should still be able to make queries through the old digest. So I wonder if the discarding in here is correct but there is something else that wrongly requests it later. Maybe because it is an input to another merged egde?

The edge can only be accessed through its vertex digest. Vertex is released by reference counting jobs. When last job goes away then it is released. That code looks quite trivial (except for the childVtx case mentioned earlier). Now if an edge(E2) gets merged to another one(E1). The vertex(V2) digest from another job points to the same edge(E1). The edge itself is reference counted as well. So if the original job(V1) that created the edge goes away it still remains because it has its own reference count.

One case that comes to mind is what would happen if first job(J1) goes away, E1 is referenced to (V2) but E1 itself is not fully processed yet. Now it could ask for its inputs but these may be already released because they were loaded with the original job? But none of that is confirmed by your traces as you seem to be getting the error from the same edge that was part of the merge.

Another thought is that are these merges really valid at all and maybe there is some loop in here that confuses things. There is a loop detection in https://github.com/moby/buildkit/blob/b2ff444122ff769c13375a8b969cb299a42a9562/solver/scheduler.go#L173-L174 but I wonder if one of these deps was merged already then it might go missing. I think this could happen when vertex has input that is already merged to another edge and then the current edge also gets merged to same edge.

I know this is all very confusing.

We have a debug mode in scheduler https://github.com/moby/buildkit/blob/b2ff444122ff769c13375a8b969cb299a42a9562/solver/scheduler.go#L17 that should print enough data(except loads & deletions that are also useful in here). But it produces too much data to run it without a known reproducer.

For other extra logging what might help if you have logs for every vertex load and deletion, every job start/delete, and what are the inputs of each vertex. That should give an understanding when the merge happens what object is from what job and if merge should not have happened or if deletion should not have happened. Also while logging print the current reference count for vertex. On merge print the reference count of edges and on getEdge error print what vertex was requesting its inputs.

tonistiigi commented 2 years ago

If you are wondering what this "merge" logic is at all there is some description in https://github.com/moby/buildkit/blob/master/docs/solver.md#merging-edges

jamesalucas commented 2 years ago

@tonistiigi thanks very much for the assistance. Here's what I have found so far...

First can you confirm where getEdge fails. Looks to be in

https://github.com/moby/buildkit/blob/b2ff444122ff769c13375a8b969cb299a42a9562/solver/jobs.go#L292

but better to be sure.

Yes, that's correct, I have a 'No active vertex' log on that line which results in the following log:

time="2021-10-06T15:45:55Z" level=info msg=">>> getEdge(Edge) edge sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f [./packages/utilities+yarn-files c7b27947-c3ef-422e-bb44-4edfa09fec08] COPY package.json ./packages/utilities/ %!s(solver.Index=0) 846. No active vertex."

Then childVtx is used in deleteIfUnreferenced. This is only used on nested build so unlikely you use that code but just to be sure it is not in effect you can add a log in

https://github.com/moby/buildkit/blob/b2ff444122ff769c13375a8b969cb299a42a9562/solver/jobs.go#L402

I have added a log statement there and it doesn't appear in my logs, so I can confirm it's not going down this path.

For the first case it looks weird as even when the edge is replaced it should not invalidate the digest in actives array. It should still be able to make queries through the old digest. So I wonder if the discarding in here is correct but there is something else that wrongly requests it later. Maybe because it is an input to another merged egde?

The edge can only be accessed through its vertex digest. Vertex is released by reference counting jobs. When last job goes away then it is released. That code looks quite trivial (except for the childVtx case mentioned earlier). Now if an edge(E2) gets merged to another one(E1). The vertex(V2) digest from another job points to the same edge(E1). The edge itself is reference counted as well. So if the original job(V1) that created the edge goes away it still remains because it has its own reference count.

One case that comes to mind is what would happen if first job(J1) goes away, E1 is referenced to (V2) but E1 itself is not fully processed yet. Now it could ask for its inputs but these may be already released because they were loaded with the original job? But none of that is confirmed by your traces as you seem to be getting the error from the same edge that was part of the merge.

Another thought is that are these merges really valid at all and maybe there is some loop in here that confuses things. There is a loop detection in

https://github.com/moby/buildkit/blob/b2ff444122ff769c13375a8b969cb299a42a9562/solver/scheduler.go#L173-L174

but I wonder if one of these deps was merged already then it might go missing. I think this could happen when vertex has input that is already merged to another edge and then the current edge also gets merged to same edge. I know this is all very confusing.

We have a debug mode in scheduler

https://github.com/moby/buildkit/blob/b2ff444122ff769c13375a8b969cb299a42a9562/solver/scheduler.go#L17

that should print enough data(except loads & deletions that are also useful in here). But it produces too much data to run it without a known reproducer. For other extra logging what might help if you have logs for every vertex load and deletion, every job start/delete, and what are the inputs of each vertex. That should give an understanding when the merge happens what object is from what job and if merge should not have happened or if deletion should not have happened. Also while logging print the current reference count for vertex. On merge print the reference count of edges and on getEdge error print what vertex was requesting its inputs.

I've added as much of the suggested logging as possible and have had error case 2 happen and what the logs appear to suggest is that:

I don't fully understand what should happen but is it possible that when the mergeTo happens, the job that contained the source digest needs to be added to the states job map so that it doesn't get removed when the original job finishes, and is kept until subsequent jobs finish?

I did see if I could do that but I can't see how you would get the job ID that created the source merge from within mergeTo.

Here is what I think is the relevant output from the log:

time="2021-10-06T15:33:14Z" level=info msg="> New job webuexzmcbaah08pybewnfngi"
time="2021-10-06T15:33:24Z" level=info msg="> New job vopwtaqzpenzqtpxqjrtxi9jd"
time="2021-10-06T15:33:25Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Initialising vertex for digest sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f, parent nil"
time="2021-10-06T15:33:25Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Vertex sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f input 0 = sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f"
time="2021-10-06T15:33:27Z" level=info msg=">>> New Input Request for edge sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f: dependent edge {%!s(solver.Index=0) %!s(*solver.vertexWithCacheOptions=&{0xc00239ef00 [{0 0xc002a3b280}] sha256:8ec39962a258bd8eca6532ad643f61fba567e52fc7276db413e64bf3f294010a})}%!(EXTRA solver.Edge={0 0xc002a3b300})"
time="2021-10-06T15:33:27Z" level=info msg=">>> New Input Request for edge sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f: dependent edge {%!s(solver.Index=0) %!s(*solver.vertexWithCacheOptions=&{0xc007935680 [] sha256:9f559abaec2fb7c6bd258d7ff0c1a253a46e2b62903c66df4b975562df57801b})}%!(EXTRA solver.Edge={0 0xc00246f200})"
time="2021-10-06T15:33:30Z" level=info msg=">>> New Input Request for edge sha256:a4ace11a6c482e05b55fee0e1f5720b42e7f2e2271a0956c039bdf2e17ee910e: dependent edge {%!s(solver.Index=0) %!s(*solver.vertexWithCacheOptions=&{0xc007935580 [{0 0xc002a3b300} {0 0xc00246f200}] sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f})}%!(EXTRA solver.Edge={0 0xc00246f2c0})"
time="2021-10-06T15:33:30Z" level=info msg=">>> New Input Request for edge sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f: dependent edge {%!s(solver.Index=0) %!s(*solver.vertexWithCacheOptions=&{0xc007935680 [] sha256:9f559abaec2fb7c6bd258d7ff0c1a253a46e2b62903c66df4b975562df57801b})}%!(EXTRA solver.Edge={0 0xc00246f200})"
time="2021-10-06T15:33:40Z" level=info msg="merging edge sha256:66347d569856d244a78e1f4af0904cd90e0035b652faa9773c140002ed4cb124 to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:33:41Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Initialising vertex for digest sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8, parent nil"
time="2021-10-06T15:33:41Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Vertex sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8 input 0 = sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8"
time="2021-10-06T15:33:42Z" level=info msg=">>> New Input Request for edge sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8: dependent edge {%!s(solver.Index=0) %!s(*solver.vertexWithCacheOptions=&{0xc00103ab80 [{0 0xc002495280}] sha256:8ec39962a258bd8eca6532ad643f61fba567e52fc7276db413e64bf3f294010a})}%!(EXTRA solver.Edge={0 0xc002495300})"
time="2021-10-06T15:33:42Z" level=info msg=">>> New Input Request for edge sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8: dependent edge {%!s(solver.Index=0) %!s(*solver.vertexWithCacheOptions=&{0xc00103bd80 [] sha256:e9d816d075f8154e71a8d013dd43b8e9e6d1c76530e985f29c7570f4c52cba16})}%!(EXTRA solver.Edge={0 0xc001b33740})"
time="2021-10-06T15:33:42Z" level=info msg=">>> New Input Request for edge sha256:cbed0fd48db527d280c9bfc32f430146933e36d3779f26091d12d0baf5038fdb: dependent edge {%!s(solver.Index=0) %!s(*solver.vertexWithCacheOptions=&{0xc00103bc80 [{0 0xc002495300} {0 0xc001b33740}] sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8})}%!(EXTRA solver.Edge={0 0xc001b33940})"
time="2021-10-06T15:33:42Z" level=info msg=">>> New Input Request for edge sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8: dependent edge {%!s(solver.Index=0) %!s(*solver.vertexWithCacheOptions=&{0xc00103bd80 [] sha256:e9d816d075f8154e71a8d013dd43b8e9e6d1c76530e985f29c7570f4c52cba16})}%!(EXTRA solver.Edge={0 0xc001b33740})"
time="2021-10-06T15:33:45Z" level=info msg="merging edge sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8 to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:33:47Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Initialising vertex for digest sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f, parent nil"
time="2021-10-06T15:33:47Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Vertex sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f input 0 = sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f"
time="2021-10-06T15:33:53Z" level=info msg="merging edge sha256:0747e9ad6db7ed2492a6f9e79b156f514d65c7bfba714fbfef9f058f890822ff to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:33:53Z" level=info msg="merging edge sha256:8b3948941c77760f05c99a1e1727e147650466ec397b8e00dec3b867c3878e9f to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:33:55Z" level=info msg="merging edge sha256:4201708eedd622f737bb9ffb6b2b5c7f873ae3c3077b8fcb2c4a3fb61d1bb84e to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:33:57Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Initialising vertex for digest sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f, parent nil"
time="2021-10-06T15:33:57Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Vertex sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f input 0 = sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f"
time="2021-10-06T15:33:57Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Initialising vertex for digest sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8, parent nil"
time="2021-10-06T15:33:57Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Vertex sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8 input 0 = sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8"
time="2021-10-06T15:33:57Z" level=info msg="merging edge sha256:d565776911406b8bec0b4df65150489b926c46e0f9550ac78dff88e66e52a26b to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:33:57Z" level=info msg="merging edge sha256:a07001061cc522b867b4a910410e0bef0eb6b07e05a38ac46583020f6a1dc058 to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:33:57Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Initialising vertex for digest sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8, parent nil"
time="2021-10-06T15:33:57Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Vertex sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8 input 0 = sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8"
time="2021-10-06T15:34:00Z" level=info msg="merging edge sha256:f3f0d00afacb80cf30ace8f5032b344d6cf7693fd25027bc608c5aea5a2322f4 to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:34:11Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Initialising vertex for digest sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f, parent nil"
time="2021-10-06T15:34:11Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Vertex sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f input 0 = sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f"
time="2021-10-06T15:34:11Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Initialising vertex for digest sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8, parent nil"
time="2021-10-06T15:34:11Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Vertex sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8 input 0 = sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8"
time="2021-10-06T15:34:11Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Initialising vertex for digest sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f, parent nil"
time="2021-10-06T15:34:11Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Vertex sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f input 0 = sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f"
time="2021-10-06T15:34:11Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Initialising vertex for digest sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8, parent nil"
time="2021-10-06T15:34:11Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Vertex sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8 input 0 = sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8"
time="2021-10-06T15:34:11Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Initialising vertex for digest sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8, parent nil"
time="2021-10-06T15:34:11Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Vertex sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8 input 0 = sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8"
time="2021-10-06T15:34:24Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Initialising vertex for digest sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f, parent nil"
time="2021-10-06T15:34:24Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Vertex sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f input 0 = sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f"
time="2021-10-06T15:34:24Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Initialising vertex for digest sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f, parent nil"
time="2021-10-06T15:34:24Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Vertex sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f input 0 = sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f"
time="2021-10-06T15:34:24Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Initialising vertex for digest sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f, parent nil"
time="2021-10-06T15:34:24Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Vertex sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f input 0 = sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f"
time="2021-10-06T15:34:24Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Initialising vertex for digest sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f, parent nil"
time="2021-10-06T15:34:24Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Vertex sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f input 0 = sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f"
time="2021-10-06T15:40:48Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Initialising vertex for digest sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8, parent nil"
time="2021-10-06T15:40:48Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Vertex sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8 input 0 = sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8"
time="2021-10-06T15:41:40Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Initialising vertex for digest sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f, parent nil"
time="2021-10-06T15:41:40Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Vertex sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f input 0 = sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f"
time="2021-10-06T15:41:53Z" level=info msg="> Job vopwtaqzpenzqtpxqjrtxi9jd: Deleting digest sha256:573a9cc2af92347a88696d22edc358651f3dbfde24ebeeb4b5b3d53c9314bbb8 "
time="2021-10-06T15:42:48Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Discarding job."
time="2021-10-06T15:42:48Z" level=info msg="> Job webuexzmcbaah08pybewnfngi: Deleting digest sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f "
time="2021-10-06T15:43:05Z" level=info msg="merging edge sha256:1b4aeedbfe23cfa3dd5ce70323504d7a584db899e1308f80ad633047f36e2044 to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:43:19Z" level=info msg="merging edge sha256:60f401399fc3960c9221efd3dc373907bf35332d864f05d8f0dc7b1e833e2859 to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:43:40Z" level=info msg="merging edge sha256:2c6662bf679e3a0ff7b8674d91ce227e744da28a37335efb9725dd2b609b8eae to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:43:48Z" level=info msg="merging edge sha256:8d198d4091a3cded993e52af1bf83e4a437e0534b47256dfc3e1375c0931ec3a to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:45:44Z" level=info msg="merging edge sha256:cc6cf370f21bc7f5b3ca3a148367c8cb6c1cd574864fd190a330aaa321ebdd52 to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:45:44Z" level=info msg="merging edge sha256:4c4645b4c51a83715a97ebcbbd149eaa563ae4b45abbfa1faae9096d979c2381 to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:45:48Z" level=info msg="merging edge sha256:e85f2bc4abebd61bd225c003880f61205dfae8495a9fb85dd5faee58e4278fa7 to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"
time="2021-10-06T15:45:55Z" level=info msg=">>> New Input Request for edge sha256:a4ace11a6c482e05b55fee0e1f5720b42e7f2e2271a0956c039bdf2e17ee910e: dependent edge {%!s(solver.Index=0) %!s(*solver.vertexWithCacheOptions=&{0xc007935580 [{0 0xc002a3b300} {0 0xc00246f200}] sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f})}%!(EXTRA solver.Edge={0 0xc00246f2c0})"
time="2021-10-06T15:45:55Z" level=info msg=">>> getEdge(Edge) edge sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f [./packages/utilities+yarn-files c7b27947-c3ef-422e-bb44-4edfa09fec08] COPY package.json ./packages/utilities/ %!s(solver.Index=0) 846. No active vertex."
time="2021-10-06T15:45:55Z" level=info msg="> failed to get edge sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f"
time="2021-10-06T15:46:31Z" level=info msg="merging edge sha256:4a3c6ea78c3c0ff61d3cdd57821dba4eb00cbf965d11d50a3d824695f8ad26b4 to sha256:462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f\n"

Perhaps I need to log all jobs in the state next.

tonistiigi commented 2 years ago

Which job is a4ace11a6c482e05b55fee0e1f5720b42e7f2e2271a0956c039bdf2e17ee910e part of? As it points to 462b113415d35517826fde1c0fde9961be715dd913d47dbd9b1a5ec5dfccaa8f it should be the first job. But then why is it still around?

tonistiigi commented 2 years ago

Initialising vertex for digest

: Vertex

Where are these being logged?

jamesalucas commented 2 years ago

Which job is a4ace11a6c482e05b55fee0e1f5720b42e7f2e2271a0956c039bdf2e17ee910e part of? As it points to a4ace11a6c482e05b55fee0e1f5720b42e7f2e2271a0956c039bdf2e17ee910e it should be the first job. But then why is it still around?

@tonistiigi a4ace11a6c482e05b55fee0e1f5720b42e7f2e2271a0956c039bdf2e17ee910e is from the first job, apologies, I forgot to attach the whole log, I've attached it here but it's quite large so I tried to include only bits I thought were relevant.

GH_buildkit.log

The : Vertex logs are coming from loadUnlocked which I added the following log statements to:

    func (jl *Solver) loadUnlocked(v, parent Vertex, j *Job, cache map[Vertex]Vertex) (Vertex, error) {
    if v, ok := cache[v]; ok {
        return v, nil
    }
    origVtx := v
    if parent != nil {
        bklog.G(context.TODO()).Infof("> Job %s: Initialising vertex for digest %s, parent %s", j.id, v.Digest(), parent.Digest())
    } else {
        bklog.G(context.TODO()).Infof("> Job %s: Initialising vertex for digest %s, parent nil", j.id, v.Digest())
    }

    inputs := make([]Edge, len(v.Inputs()))
    for i, e := range v.Inputs() {
        v, err := jl.loadUnlocked(e.Vertex, parent, j, cache)
        if err != nil {
            return nil, err
        }
        inputs[i] = Edge{Index: e.Index, Vertex: v}
        bklog.G(context.TODO()).Infof("> Job %s: Vertex %s input %d = %s", j.id, v.Digest(), i, e.Vertex.Digest())
    }
    // rest of function omitted

In our CI environment we have 10 runners on the same machine, all using buildkit running in Docker via Earthly. It appears that one call to Earthly will result in a job being created in buildkit and subsequent calls from other runners will create subsequent jobs in buildkit, I guess that is all to be expected; but edges are merged between earlier and later jobs. If the earlier job finishes before the later one its edges are removed then we get this error as the later job is now referencing an edge that was merged to by the first (and subsequent) jobs?

I have made a change which appears to fix it, but I don't doubt there is a better way... I have added some additional lines to setEdge so that when two edges are merged the jobs from one are copied onto the other. This then prevents them from being removed until both jobs have finished. I have doubts that jobs from the target need to be copied onto the source, but given error scenario 1 above seems to suggest it is necessary, I thought I would start out conservative and see if it fixes it and if so, work back to a better solution.

func (jl *Solver) setEdge(e Edge, newEdge *edge) {
    jl.mu.RLock()
    defer jl.mu.RUnlock()

    st, ok := jl.actives[e.Vertex.Digest()]
    if !ok {
        return
    }

        // Get the state for the edge merge target
    st1, ok := jl.actives[newEdge.edge.Vertex.Digest()]
    if ok {
                // Add all jobs from the source state to the target state
        for index, element := range st.jobs {
          // bklog.G(context.TODO()).Infof(">>> Adding job %s to target state for merged vertex src %s target %s.", index.id, e.Vertex.Digest(), newEdge.edge.Vertex.Digest())
            st1.jobs[index] = element
        }
                // Add all jobs from the target state to the source state
        for index, element := range st1.jobs {
          // bklog.G(context.TODO()).Infof(">>> Adding job %s to src state for merged vertex src %s target %s.", index.id, e.Vertex.Digest(), newEdge.edge.Vertex.Digest())
            st.jobs[index] = element
        }
    }

    st.setEdge(e.Index, newEdge)
}

This results in the edges being kept until any job using them is removed, but I suspect there is a better fix?!

jamesalucas commented 2 years ago

I've been running this since yesterday and although the number has reduced, I'm still seeing inconsistent graph state errors. :-(

jamesalucas commented 2 years ago

I have had numerous attempts at fixing this but so far every one has failed. I have checked the reference counts and they appear to be maintained correctly but it still appears that some edges are removed whilst other jobs remain dependent upon them. I am currently running a version with the mergeTo functionality disabled and this has proven to be stable at the cost of performance for some jobs that are duplicated.

tonistiigi commented 2 years ago

@jamesalucas Maybe you now have some insights how to write a reproducer for this. it is fine also if a code change needs to be made to make it reproduce more often.

rchernobelskiy commented 2 years ago

Ran into this today on v0.9.2 Some info about what we do:

it doesn't usually happen but today happened on two separate occasions, for about 5 of the images in each occasion each time it was these two logs together:

importing cache manifest from <image name redacted>
error: failed to solve: failed to get edge: inconsistent graph state

We have 3 retries for each image and about 4 of the images succeeded on one of the retries while one image failed all 3 retries

griffous commented 2 years ago

I can't help with a reproducer unfortunately, as our setup isn't a simple use-case, but I hope I can by adding some further examples which are triggering the condition, and how we get to here.

Similar to other reporters, we:

Here is an example stack trace, which was building 5 parallel targets at the time: buildkit-prod-596464b95f-4z2ls.log

A bit more detail on the caching strategy that we use, since this bug seems to revolve around caching usage...and perhaps we're doing registry caching wrong?

Our image library is around 8 stages deep, and 15-20 wide, culminating in around 60 final run stage/images. Buildkit always crashes if we simply ask it to build "all final stages" (1300+ steps), so we instead use multi-stage build pipelines to carefully and progressively iterate down through the entire "famliy tree" of stages, one "stage-layer" levels at time, so as not to overwhelm and crash buildkit. The build pipeline begins with buildkit builds at the common top-most base image, successively stepping downwards through the tree to building stages which depend on the prior stages, eventually reaching the bottom after 7-8 pipeline stages.

I'm not sure if this is how large-scale registry caching is intended to be used, but export-cache mode=max doesn't seem to be supported by our private registry, so this was the best I could come up with! If this pattern isn't what buildkit was designed to support - and therefore is leading to this bug - I would happily take suggestions for a better-supported design to build our library using buildkit.

coryb commented 1 year ago

I can reproduce this failure within a few seconds now. I think it is also related to #3714. I was running into this issue when starting to load-test various scenarios.

The issue seems to be related to a race when solving dockerfiles in parallel where requests share common vertices but differ via the no-cache frontend attribute. Here is my repro that uses only buildkit client code: https://gist.github.com/coryb/60e5470ee2f4b534ad2538b8b3dfb55e Specifically this logic to randomize adding the no-cache attribute is what triggers the issue.

I am reproducing this via buildkit in a docker container:

docker run -ti --rm --name buildkit --privileged --pull=always  moby/buildkit:master --debug

Then I run the repro case with:

BUILDKIT_HOST=docker-container://buildkit go run ./cmd/issue-2303

With this I am randomly seeing one of these errors from buildkitd:

ERRO[2023-06-15T20:20:36Z] /moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = failed to get edge: inconsistent graph state
failed to get edge: inconsistent graph state
1 2ff0d2a buildkitd --debug
github.com/moby/buildkit/solver.(*pipeFactory).NewInputRequest.func1
    /src/solver/scheduler.go:351
github.com/moby/buildkit/solver/internal/pipe.NewWithFunction.func2
    /src/solver/internal/pipe/pipe.go:82
runtime.goexit
    /usr/local/go/src/runtime/asm_arm64.s:1172

or

panic: interface conversion: interface {} is nil, not *solver.edgeRequest

goroutine 51 [running]:
github.com/moby/buildkit/solver.(*edge).createInputRequests(0x4000379b80, 0x3, 0x1?, 0x0)
    /src/solver/edge.go:814 +0x8a4
github.com/moby/buildkit/solver.(*edge).unpark(0x4000379b80, {0x40013cee50, 0x1, 0x1}, {0x40013cee60?, 0x1, 0x1}, {0x400143b770, 0x3, 0x3}, ...)
    /src/solver/edge.go:363 +0x21c
github.com/moby/buildkit/solver.(*scheduler).dispatch(0x400064f0a0, 0x4000379b80)
    /src/solver/scheduler.go:136 +0x3a4
github.com/moby/buildkit/solver.(*scheduler).loop(0x400064f0a0)
    /src/solver/scheduler.go:104 +0x284
created by github.com/moby/buildkit/solver.newScheduler
    /src/solver/scheduler.go:35 +0x218

Also probably related, if we remove the COPY instruction from the testing Dockerfile then we get a different error from buildkitd:

ERRO[2023-06-15T20:32:16Z] /moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = invalid request {0 0x4001cd1680} for build
invalid request {0 0x4001cd1680} for build
1 2ff0d2a buildkitd --debug
github.com/moby/buildkit/solver.(*scheduler).build
    /src/solver/scheduler.go:220
github.com/moby/buildkit/solver.(*Job).Build
    /src/solver/jobs.go:516
github.com/moby/buildkit/solver/llbsolver.(*llbBridge).loadResult
    /src/solver/llbsolver/bridge.go:147
github.com/moby/buildkit/solver/llbsolver.(*resultProxy).loadResult
    /src/solver/llbsolver/bridge.go:227
github.com/moby/buildkit/solver/llbsolver.(*resultProxy).Result.func2
    /src/solver/llbsolver/bridge.go:255
github.com/moby/buildkit/util/flightcontrol.(*call).run
    /src/util/flightcontrol/flightcontrol.go:121
sync.(*Once).doSlow
    /usr/local/go/src/sync/once.go:74
sync.(*Once).Do
    /usr/local/go/src/sync/once.go:65
runtime.goexit
    /usr/local/go/src/runtime/asm_arm64.s:1172

I have no idea where this issue is specifically, and I am not familiar with the solver logic, so hopefully someone can use this to narrow down what is really going on.

coryb commented 1 year ago

Created a slightly simpler reproduction using client.Solve and llb.Def directly: https://gist.github.com/coryb/6f342926ed3bdc2a3abc2bd11c1eb16e

The issue seems to be related to llb.IgnoreCache being applied to common vertices that are running in parallel.

jacobwgillespie commented 1 year ago

Thank you @coryb, that reproduction was super helpful (we've been trying to track this down for a while) - I've sent a PR here that might fix: #3953

radusuciu commented 7 months ago

I'm hitting this when using bake in GHA and building containers that share lots of stages but differ in UID/GID and seems to fail around COPY --from=stage instructions. I don't have a repro that I can share right now but just wanted to note..

My workaround has been to split up the build into different steps which is not ideal because it sort of goes against the whole point of using bake.. but a working build beats a broken one. Also, I have not yet had this happen when running bake locally (without splitting anything up).

radusuciu commented 4 months ago

Still hitting this error, though I've got a few more details (still no repro :/). I've reduced the number of stages/images that I build and the average reliability of bake on GHA has increased.

I'd like to add that the targets I'm building are all using registry cache with mode=max and I've been able to get around the error by deleting the cache image corresponding to the image referenced in the logs immediately before hitting this error:

#42 CACHED
ERROR: failed to solve: failed to get edge: inconsistent graph state
Error: buildx bake failed with: ERROR: failed to solve: failed to get edge: inconsistent graph state