skupperproject / skupper-router

An application-layer router for Skupper networks
https://skupper.io
Apache License 2.0
14 stars 18 forks source link

system_tests_policy failure #145

Closed ganeshmurthy closed 2 years ago

ganeshmurthy commented 2 years ago
21:     def test_link_name_parse_tree_patterns(self):
21:         # update to replace source/target match patterns
21:         qdm_out = "<not written>"
21:         try:
21:             qdm_out = self.run_qdmanage('update --type=vhost --name=vhost/$default --stdin', input=self.default_patterns())
21:         except Exception as e:
21: >           self.assertTrue(False, msg=('Error running qdmanage %s' % str(e)))
21: E           AssertionError: False is not true : Error running qdmanage Process 2907 error: exit code 1, expected 0
21: E           qdmanage update --type=vhost --name=vhost/$default --stdin --bus ***0.0.0.0:21493 --indent=-1 --timeout 300.0
21: E           /__w/skupper-router/skupper-router/skupper-router/build/tests/system_test.dir/system_tests_policy/PolicyLinkNamePatternTest/test_link_name_parse_tree_patterns/qdmanage-72.cmd
21: E           >>>>
21: E           <<<<
21: E           ConnectionException: Connection ***0.0.0.0:21493 disconnected: Condition('proton.pythonio', 'Connection refused to all addresses')
21: 
21: /home/runner/work/skupper-router/skupper-router/skupper-router/tests/system_tests_policy.py:935: AssertionError
21: - generated xml file: /__w/skupper-router/skupper-router/skupper-router/build/tests/junitxmls/system_tests_policy.xml -
21: =========================== short test summary info ============================
21: FAILED ::PolicyLinkNamePatternTest::test_link_name_parse_tree_patterns

https://github.com/skupperproject/skupper-router/runs/5484539566?check_suite_focus=true#step:26:1502

jiridanek commented 2 years ago

This happened multiple times in Dispatch. Here are three Jiras that have the error message in logs, closed with Cannot reproduce

  1. https://issues.apache.org/jira/browse/DISPATCH-1827
  2. https://issues.apache.org/jira/browse/DISPATCH-1936
  3. https://issues.apache.org/jira/browse/DISPATCH-2080

And here are two where the issue was caused by crashed router, which was then fixed

  1. https://issues.apache.org/jira/browse/DISPATCH-1973
  2. https://issues.apache.org/jira/browse/DISPATCH-2168
jiridanek commented 2 years ago

Behold router log from https://github.com/skupperproject/skupper-router/runs/5499987524?check_suite_focus=true#step:26:1886

2022-03-09 17:53:23.940555 +0000 CONN_MGR (info) Configured Listener: 0.0.0.0:21493 proto=any, role=normal (/home/runner/work/skupper-router/skupper-router/skupper-router/src/connection_manager.c:554)
2022-03-09 17:53:23.941212 +0000 AGENT (debug) Add entity: ListenerEntity(authenticatePeer=False, cost=1, healthz=True, host=0.0.0.0, http=False, identity=listener/0.0.0.0:21493, idleTimeoutSeconds=120, initialHandshakeTimeoutSeconds=0, maxFrameSize=16384, maxSessions=32768, messageLoggingComponents=none, metrics=True, name=listener/0.0.0.0:21493, port=21493, requireEncryption=False, requireSsl=False, role=normal, saslMechanisms=ANONYMOUS, stripAnnotations=both, type=org.apache.qpid.dispatch.listener, websockets=True) (/home/runner/work/skupper-router/skupper-router/skupper-router/python/qpid_dispatch_internal/management/agent.py:602)
2022-03-09 17:53:23.941777 +0000 SERVER (notice) Operational, 4 Threads Running (process ID 2893) (/home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:1467)
2022-03-09 17:53:23.942069 +0000 SERVER (notice) Process VmSize 124.50 TiB (6.78 GiB available memory) (/home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:1478)
2022-03-09 17:53:23.960887 +0000 SERVER (notice) Listening on 0.0.0.0:21493 (/home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:823)
2022-03-09 17:53:23.961173 +0000 ROUTER_CORE (trace) Core action 'process_tick' (/home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/router_core_thread.c:235)
2022-03-09 17:53:24.319645 +0000 SERVER (notice) Shut Down (/home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:1498)

The test code that does waiting for router is

https://github.com/skupperproject/skupper-router/blob/54cd50fd59cd20f05dfb0987a72ce7f8333e07ed/tests/system_tests_policy.py#L771-L775

which is weird. Why not just do wait=True)? Why ignore timeout, or actually any exception from wait_ready, including possibly SyntaxError?

Coming in, I wanted to find some evidence that the epoll proactor (https://github.com/rr-debugger/rr/issues/3086) could be to blame, but there is much simpler explanation: the test does not really wait for the point to open (machine is under high load due to -j12 tsan tests, router startup may take more than 5s) and then tries to connect to still closed port.

ganeshmurthy commented 2 years ago

Closing since policy tests have been commented out in https://github.com/skupperproject/skupper-router/commit/7f49f340643c242c96313bc49e4dfffbd13d9337