openHPI / poseidon

Scalable task execution orchestrator for CodeOcean
MIT License
9 stars 1 forks source link

Fix flaky e2e test #623

Open MrSerth opened 5 months ago

MrSerth commented 5 months ago
websocket_test.go:363: 
        Error Trace:    /home/runner/work/poseidon/poseidon/tests/e2e/websocket_test.go:363
                                    /home/runner/work/poseidon/poseidon/tests/e2e/websocket_test.go:170
                                    /home/runner/go/pkg/mod/github.com/stretchr/testify@v1.9.0/suite/suite.go:115
        Error:          Not equal: 
                        expected: 1
                        actual  : 0
        Test:           TestE2ETestSuite/TestCommandMakeEnvironmentVariables/0
Unexpected EOF

``` === RUN TestE2ETestSuite/TestGetFileContent_Nomad/Permission_Denied {"environment_id":"0","error":"unexpected EOF","level":"warning","msg":"Unexpected EOF for Execute","package":"nomad","runner_id":"0-8a48a4a7-6969-11ef-a609-000d3a600657","sentry-fingerprint":["nomad-unexpected-eof"],"time":"2024-09-02T20:25:50.041434Z"} runners_test.go:473: Error Trace: /home/runner/work/poseidon/poseidon/tests/e2e/runners_test.go:473 /home/runner/go/pkg/mod/github.com/stretchr/testify@v1.9.0/suite/suite.go:115 Error: Not equal: expected: 424 actual : 200 Test: TestE2ETestSuite/TestGetFileContent_Nomad/Permission_Denied runners_test.go:476: Error Trace: /home/runner/work/poseidon/poseidon/tests/e2e/runners_test.go:476 /home/runner/go/pkg/mod/github.com/stretchr/testify@v1.9.0/suite/suite.go:115 Error: "" does not contain "file not found or insufficient permissions" Test: TestE2ETestSuite/TestGetFileContent_Nomad/Permission_Denied ```

mpass99 commented 2 months ago

The second flaky test event can be reduced to the known upstream bug discussed in #589. This occurs not only in production, but also in our e2e tests. Until the upstream issue is fixed, we can identify this cause by the line {"environment_id":"0","error":"unexpected EOF","level":"warning","msg":"Unexpected EOF for Execute","package":"nomad","runner_id":"0-8a48a4a7-6969-11ef-a609-000d3a600657","sentry-fingerprint":["nomad-unexpected-eof"],"time":"2024-09-02T20:25:50.041434Z"}.

We might even consider bumping nomad#12198, as the non-official Docker SDK produces this bug and has not fixed it since its disclosure two months ago.

MrSerth commented 2 months ago

Thanks for looking into the e2e test suite and pointing out the unexpected EOF bug we saw. Bumping nomad#12198 is a good idea and I've just added an upvote to the issue myself. Still, this probably only a long-term solution, since swapping the entire library is likely pretty complex for Nomad.

In the meantime, we could also try bumping go-dockerclient#1076, which would help for the e2e tests and the production use. Until then, I assume we need to live with these sporadic occurrences...

MrSerth commented 1 month ago

Soon, the second flaky e2e test should no longer occur: Nomad switched from from fsouza/go-dockerclient to the native docker client as I've detailed in https://github.com/openHPI/poseidon/issues/589#issuecomment-2408172650.