Open tonistiigi opened 3 years ago
another:
2020-09-17T02:40:34.1535028Z --- FAIL: TestProgress (0.11s)
2020-09-17T02:40:34.1536861Z progress_test.go:38:
2020-09-17T02:40:34.1538470Z Error Trace: progress_test.go:38
2020-09-17T02:40:34.1539457Z Error: Should be true
2020-09-17T02:40:34.1540279Z Test: TestProgress
2020-09-17T02:40:34.1607878Z FAIL
2020-09-17T02:40:34.1609224Z FAIL github.com/moby/buildkit/util/progress 0.245s
The log URL for TestProgress
has expired.
TestCacheMountLockedRefs
is not failing on master, so it's not something endemically wrong. The failure is a timeout, so I suspect test parallelism might have caused a missed deadline there. I poked back, and I saw a pass for that test in 2.54s, so it's not 100% time-sensitive, just that last unlock, I suspect.
For TestProgress
, I can't see how this would go wrong unless somehow the ctx got Done
early. If I get time to poke at this, might be worth fixing the assert.True
to be assert.Less
so we (I expect) actually see the len
it's getting, as "5" or "0" would be two very different likely causes. That said, I don't know the progress
package API at all, so I may be misreading that test.
Still seeing this one: https://github.com/moby/buildkit/actions/runs/5877167737/job/15936737711?pr=4148#step:5:2053
=== Failed
=== FAIL: solver/llbsolver/mounts TestCacheMountLockedRefs (28.95s)
mount_test.go:308:
Error Trace: D:/a/buildkit/buildkit/solver/llbsolver/mounts/mount_test.go:308
Error: mount did not unlock
Test: TestCacheMountLockedRefs
panic: Fail in goroutine after TestCacheMountLockedRefs has completed
goroutine 13 [running]:
testing.(*common).Fail(0xc000059a00)
C:/hostedtoolcache/windows/go/1.20.7/x64/src/testing/testing.go:933 +0xe5
testing.(*common).Errorf(0xc000059a00, {0x1a49072?, 0xf5c6c7?}, {0xc0001a42f0?, 0x18d91c0?, 0x501?})
C:/hostedtoolcache/windows/go/1.20.7/x64/src/testing/testing.go:1050 +0x65
github.com/stretchr/testify/assert.Fail({0x221e775deb8, 0xc000059a00}, {0xc0000e8c00, 0x3eb}, {0x0, 0x0, 0x0})
D:/a/buildkit/buildkit/vendor/github.com/stretchr/testify/assert/assertions.go:333 +0x378
github.com/stretchr/testify/assert.NoError({0x221e775deb8, 0xc000059a00}, {0x1b9bf20, 0xc00008fd28}, {0x0, 0x0, 0x0})
D:/a/buildkit/buildkit/vendor/github.com/stretchr/testify/assert/assertions.go:1495 +0x105
github.com/stretchr/testify/require.NoError({0x1ba0970, 0xc000059a00}, {0x1b9bf20, 0xc00008fd28}, {0x0, 0x0, 0x0})
D:/a/buildkit/buildkit/vendor/github.com/stretchr/testify/require/require.go:1357 +0x96
github.com/moby/buildkit/solver/llbsolver/mounts.TestCacheMountLockedRefs.func2()
D:/a/buildkit/buildkit/solver/llbsolver/mounts/mount_test.go:291 +0x98
created by github.com/moby/buildkit/solver/llbsolver/mounts.TestCacheMountLockedRefs
D:/a/buildkit/buildkit/solver/llbsolver/mounts/mount_test.go:289 +0x75d
@gabriel-samfira if you're interested to take a look :pray:
@crazy-max wrapping up something today and will have a look at this tomorrow morning. I assume this is a flake. Will run this test in a loop and reproduce.
Based on the panic, I think this is a timeout, hiding the real failure.
Note that earlier, the test failed:
mount_test.go:308:
Error Trace: D:/a/buildkit/buildkit/solver/llbsolver/mounts/mount_test.go:308
Error: mount did not unlock
Test: TestCacheMountLockedRefs
--- FAIL: TestCacheMountLockedRefs (28.95s)
i.e. it hit the two-second timeout while waiting for the goroutine at https://github.com/moby/buildkit/blob/2827be6197a73a8f6d2af24b9340ddfde0e75de7/solver/llbsolver/mounts/mount_test.go#L305-L309
The panic that comes later is coming from line 291 (the goroutine) at https://github.com/moby/buildkit/blob/2827be6197a73a8f6d2af24b9340ddfde0e75de7/solver/llbsolver/mounts/mount_test.go#L287-L294
panic: Fail in goroutine after TestCacheMountLockedRefs has completed
goroutine 13 [running]:
testing.(*common).Fail(0xc000059a00)
C:/hostedtoolcache/windows/go/1.20.7/x64/src/testing/testing.go:933 +0xe5
testing.(*common).Errorf(0xc000059a00, {0x1a49072?, 0xf5c6c7?}, {0xc0001a42f0?, 0x18d91c0?, 0x501?})
C:/hostedtoolcache/windows/go/1.20.7/x64/src/testing/testing.go:1050 +0x65
github.com/stretchr/testify/assert.Fail({0x221e775deb8, 0xc000059a00}, {0xc0000e8c00, 0x3eb}, {0x0, 0x0, 0x0})
D:/a/buildkit/buildkit/vendor/github.com/stretchr/testify/assert/assertions.go:333 +0x378
github.com/stretchr/testify/assert.NoError({0x221e775deb8, 0xc000059a00}, {0x1b9bf20, 0xc00008fd28}, {0x0, 0x0, 0x0})
D:/a/buildkit/buildkit/vendor/github.com/stretchr/testify/assert/assertions.go:1495 +0x105
github.com/stretchr/testify/require.NoError({0x1ba0970, 0xc000059a00}, {0x1b9bf20, 0xc00008fd28}, {0x0, 0x0, 0x0})
D:/a/buildkit/buildkit/vendor/github.com/stretchr/testify/require/require.go:1357 +0x96
github.com/moby/buildkit/solver/llbsolver/mounts.TestCacheMountLockedRefs.func2()
D:/a/buildkit/buildkit/solver/llbsolver/mounts/mount_test.go:291 +0x98
created by github.com/moby/buildkit/solver/llbsolver/mounts.TestCacheMountLockedRefs
D:/a/buildkit/buildkit/solver/llbsolver/mounts/mount_test.go:289 +0x75d
FAIL github.com/moby/buildkit/solver/llbsolver/mounts 44.633s
That's a real failure (assert.NoError
would not have called assert.Fail
if err
was nil
) but since it panicked, we never saw the content of that error.
Given that the panic happened more than 15 seconds after the test reported failure, potentially a much higher timeout is required on Windows in order to catch an actual failure. Or maybe Go should actually tell us what the error was in that panic... not that we can change that. (We could locally output the error before calling assert.NoError
as that might make it easier to track down, without increasing the timeout; I think an extra 28 seconds in a failed run isn't a lot on CI, but this might work for tight-loop reproduction... although running just this test in a loop might destroy the goroutine before it finishes, hiding the failure? I'm not sure if Go enforces joining on goroutines at process termination.)
Also side-note, it looks like the test runner, when it sees two failures for one test like that, gives a summary (quoted in https://github.com/moby/buildkit/issues/1685#issuecomment-1680330891) which includes the second failure, but gives the run-time of the first failure.
Saw this failure on my PR branch. https://github.com/tonistiigi/buildkit/runs/1119868274 @TBBle