Closed ganeshmurthy closed 2 years ago
This happened multiple times in Dispatch. Here are three Jiras that have the error message in logs, closed with Cannot reproduce
And here are two where the issue was caused by crashed router, which was then fixed
Behold router log from https://github.com/skupperproject/skupper-router/runs/5499987524?check_suite_focus=true#step:26:1886
2022-03-09 17:53:23.940555 +0000 CONN_MGR (info) Configured Listener: 0.0.0.0:21493 proto=any, role=normal (/home/runner/work/skupper-router/skupper-router/skupper-router/src/connection_manager.c:554)
2022-03-09 17:53:23.941212 +0000 AGENT (debug) Add entity: ListenerEntity(authenticatePeer=False, cost=1, healthz=True, host=0.0.0.0, http=False, identity=listener/0.0.0.0:21493, idleTimeoutSeconds=120, initialHandshakeTimeoutSeconds=0, maxFrameSize=16384, maxSessions=32768, messageLoggingComponents=none, metrics=True, name=listener/0.0.0.0:21493, port=21493, requireEncryption=False, requireSsl=False, role=normal, saslMechanisms=ANONYMOUS, stripAnnotations=both, type=org.apache.qpid.dispatch.listener, websockets=True) (/home/runner/work/skupper-router/skupper-router/skupper-router/python/qpid_dispatch_internal/management/agent.py:602)
2022-03-09 17:53:23.941777 +0000 SERVER (notice) Operational, 4 Threads Running (process ID 2893) (/home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:1467)
2022-03-09 17:53:23.942069 +0000 SERVER (notice) Process VmSize 124.50 TiB (6.78 GiB available memory) (/home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:1478)
2022-03-09 17:53:23.960887 +0000 SERVER (notice) Listening on 0.0.0.0:21493 (/home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:823)
2022-03-09 17:53:23.961173 +0000 ROUTER_CORE (trace) Core action 'process_tick' (/home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/router_core_thread.c:235)
2022-03-09 17:53:24.319645 +0000 SERVER (notice) Shut Down (/home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:1498)
The test code that does waiting for router is
which is weird. Why not just do wait=True)
? Why ignore timeout, or actually any exception from wait_ready
, including possibly SyntaxError?
Coming in, I wanted to find some evidence that the epoll proactor (https://github.com/rr-debugger/rr/issues/3086) could be to blame, but there is much simpler explanation: the test does not really wait for the point to open (machine is under high load due to -j12 tsan tests, router startup may take more than 5s) and then tries to connect to still closed port.
Closing since policy tests have been commented out in https://github.com/skupperproject/skupper-router/commit/7f49f340643c242c96313bc49e4dfffbd13d9337
https://github.com/skupperproject/skupper-router/runs/5484539566?check_suite_focus=true#step:26:1502