Closed Groxx closed 1 month ago
unfortunately not sure which test is producing these, AFAICT they're not testlogger-using, so I've just replaced everything I can find:
panic: Log in goroutine after TestIntegrationSuite has completed: 2024-05-28T22:34:12.141Z ERROR Persistent store operation failure {"component": "matching-engine", "wf-domain-name": "integration-test-domain-a76efb8a-48f9-47fb-a74a-9e91f7713bd7", "wf-task-list-name": "integration-continue-as-new-workflow-timeout-test-tasklist", "wf-task-list-type": 0, "store-operation": "update-task-list", "error": "UpdateTaskList operation failed. Error: session has been closed", "logging-call-at": "task_reader.go:224"}
goroutine 74122 [running]:
testing.(*common).logDepth(0xc0014a2820, {0xc001b0c1c0, 0x1bb}, 0x3)
/usr/local/go/src/testing/testing.go:1028 +0x6d4
testing.(*common).log(...)
/usr/local/go/src/testing/testing.go:1010
testing.(*common).Logf(0xc0014a2820, {0x4a89a74, 0x2}, {0xc00732a290, 0x1, 0x1})
/usr/local/go/src/testing/testing.go:1061 +0xa5
go.uber.org/zap/zaptest.testingWriter.Write({{0x7f1dc0d489c0?, 0xc0014a2820?}, 0x80?}, {0xc006340800, 0x1bc, 0x400})
/go/pkg/mod/go.uber.org/zap@v1.13.0/zaptest/logger.go:130 +0x11e
go.uber.org/zap/zapcore.(*ioCore).Write(0xc008adac00, {0x2, {0xc18db4d9086c398b, 0x9eec277b0a, 0x826bf60}, {0x0, 0x0}, {0x4adb3e2, 0x22}, {0x0, ...}, ...}, ...)
/go/pkg/mod/go.uber.org/zap@v1.13.0/zapcore/core.go:90 +0x193
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc005404210, {0xc008284400, 0x3, 0x4})
/go/pkg/mod/go.uber.org/zap@v1.13.0/zapcore/entry.go:216 +0x210
go.uber.org/zap.(*Logger).Error(0xc008581080, {0x4adb3e2, 0x22}, {0xc008284400, 0x3, 0x4})
/go/pkg/mod/go.uber.org/zap@v1.13.0/logger.go:203 +0x68
github.com/uber/cadence/common/log/loggerimpl.(*loggerImpl).Error(0xc0073c5620, {0x4adb3e2, 0x22}, {0xc0071f9480, 0x2, 0x2})
/cadence/common/log/loggerimpl/logger.go:150 +0x19f
github.com/uber/cadence/service/matching/tasklist.(*taskReader).getTasksPump(0xc003ada000)
/cadence/service/matching/tasklist/task_reader.go:224 +0xe17
github.com/uber/cadence/service/matching/tasklist.(*taskReader).Start.func2()
/cadence/service/matching/tasklist/task_reader.go:141 +0x98
created by github.com/uber/cadence/service/matching/tasklist.(*taskReader).Start in goroutine 74130
/cadence/service/matching/tasklist/task_reader.go:139 +0x38e
FAIL github.com/uber/cadence/host 682.736s
FAIL
If this commit doesn't work, I'll have to see if I screwed up the implementation.
gonna rerun a few times and check output, I believe I should see "bad" logs or it's probably just lacking evidence that it fixed anything.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 68.16%. Comparing base (
5f45da5
) to head (01b0a67
). Report is 2 commits behind head on master.
.... I think I'm going to sit on this for a while. I tried to construct a test that would show if I implemented this correctly, and it pretty randomly fails or succeeds in super duper confusing ways.
Even several seconds later, when waiting in a different test entirely, it still doesn't always mark t.done = true
in the test for some reason, and doesn't always notice the log. Which seems insane from the implementation.
Not sure if my Go is screwing up or blending caches or if this log-detecting is just fundamentally unreliable for some reason.
edit: well. it's fundamentally unreliable. https://github.com/golang/go/issues/67701 I'm not sure if this is something that'll change or not. it may fail too many tests...
"Fixes" test failures like this, by sweeping them under the rug:
When the test completes, it will simply log to stderr rather than the test logger:
Ignoring the correctness part of the problem, this gives us the best of both worlds:
-v
or some other test fails), but will not directly fail the test(s)I am not overly fond of this, but some of our largest tests are quite flaky due to these logs, and that really isn't helping anything. Seems like a reasonable tradeoff, and we can move to an opt-in model eventually instead of applying it everywhere like this PR does.