skupperproject / skupper-router

An application-layer router for Skupper networks
https://skupper.io
Apache License 2.0
14 stars 18 forks source link

AttributeError: 'NoneType' object has no attribute 'properties' in OversizeMulticastTransferTest #126

Open jiridanek opened 2 years ago

jiridanek commented 2 years ago

https://github.com/jiridanek/skupper-router/runs/5426622599?check_suite_focus=true#step:9:5364

23: DEBUG    root:system_test.py:1390 2022-03-04 18:41:21.956829 OversizeMulticastTransferTest - multicast/e81: Container run exception: ['Traceback (most recent call last):
  File "/home/runner/work/skupper-router/skupper-router/skupper-router/tests/system_tests_policy_oversize_compound.py", line 631, in run
    Container(self).run()
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/proton/_reactor.py", line 194, in run
    while self.process():
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/proton/_reactor.py", line 257, in process
    event.dispatch(handler)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/proton/_events.py", line 164, in dispatch
    self.dispatch(h, type)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/proton/_events.py", line 161, in dispatch
    _dispatch(handler, type.method, self)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/proton/_events.py", line 128, in _dispatch
    m(*args)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/proton/_handlers.py", line 753, in on_reactor_init
    self.on_start(event)
  File "/home/runner/work/skupper-router/skupper-router/skupper-router/tests/system_tests_policy_oversize_compound.py", line 423, in on_start
    self.test_class.wait_router_network_connected()
  File "/home/runner/work/skupper-router/skupper-router/skupper-router/tests/system_tests_policy_oversize_compound.py", line 740, in wait_router_network_connected
    cls.INT_A.wait_router_connected(\'INT.B\')
  File "/home/runner/work/skupper-router/skupper-router/skupper-router/tests/system_test.py", line 745, in wait_router_connected
    retry(lambda: self.is_router_connected(router_id), **retry_kwargs)
  File "/home/runner/work/skupper-router/skupper-router/skupper-router/tests/system_test.py", line 140, in retry
    ret = function()
  File "/home/runner/work/skupper-router/skupper-router/skupper-router/tests/system_test.py", line 745, in <lambda>
    retry(lambda: self.is_router_connected(router_id), **retry_kwargs)
  File "/home/runner/work/skupper-router/skupper-router/skupper-router/tests/system_test.py", line 727, in is_router_connected
    self.management.read(identity="router.node/%s" % router_id)
  File "/home/runner/work/skupper-router/skupper-router/skupper-router/python/qpid_dispatch/management/client.py", line 331, in read
    return Entity(self, self.call(request).body)
  File "/home/runner/work/skupper-router/skupper-router/skupper-router/python/qpid_dispatch/management/client.py", line 192, in call
    self.check_response(response, expect=expect)
  File "/home/runner/work/skupper-router/skupper-router/skupper-router/python/qpid_dispatch/management/client.py", line 159, in check_response
    code = response.properties.get(\'statusCode\')
AttributeError: 'NoneType' object has no attribute 'properties'
"]
jiridanek commented 2 years ago

Issue happens even when running test single-threaded, https://github.com/jiridanek/skupper-router/runs/5428006739?check_suite_focus=true#step:27:5701

jiridanek commented 2 years ago

I've tried investigating the test. There seem to be multiple possible issues that can cause the test to fail, and getting over one exposes another.

The logs quoted in this message are from a PR with extra logging added. E.g. to log full stracktrace of the exception, and such.

AttributeError: 'NoneType' object has no attribute 'properties'

https://github.com/jiridanek/skupper-router/runs/5428035645?check_suite_focus=true#step:9:7800

This gets raised from line 739 doing

https://github.com/skupperproject/skupper-router/blob/454942e3ee4ba28338d84782a6cee75ce3b9846d/tests/system_tests_policy_oversize_compound.py#L737-L742

I do not see how it is possible for SyncRequestResponse#call to return None (it should always either return received Message, or raise a TimeoutException, as far as I can tell), but nevertheless, that is what seems to be happening

https://github.com/skupperproject/skupper-router/blob/454942e3ee4ba28338d84782a6cee75ce3b9846d/python/qpid_dispatch/management/client.py#L186-L193

Getting a None might be possible only if the management client gets disconnected after it or the other peer (the router) closed connection, while management client waits for its reply, given this code in Proton

        if self.disconnected and not self._is_closed():
            raise ConnectionException(
                "Connection %s disconnected: %s" % (self.url, self.disconnected))

(from https://github.com/apache/qpid-proton/blob/bf923263c6b7ebf32b0d00796822cb7e282f12cd/python/proton/_utils.py#L532)

That cannot be possible, however, because the previous test has failed (https://github.com/jiridanek/skupper-router/runs/5428035645?check_suite_focus=true#step:9:7749) less than a second prior; there was not enough time for a timeout to occur.

There is a timeout-expired message in INT.A.log, lines 8718-8724, that looks like idle timeout from management client! https://gist.github.com/jiridanek/109918a935e0325ec01bd193ed488984#file-int-a-log

2022-03-04 20:57:27.086512 +0000 ROUTER_CORE (trace) Core action 'link_flow' (../src/router_core/router_core_thread.c:235)
2022-03-04 20:57:27.186954 +0000 PROTOCOL (trace) [C6]:FRAME: 0 -> @close(24) [error=@error(29) [condition=:"amqp:resource-limit-exceeded", description="local-idle-timeout expired"]] (../src/server.c:110)
2022-03-04 20:57:27.187026 +0000 SERVER (info) [C6] Connection from 127.0.0.1:32860 (to 0.0.0.0:20514) failed: amqp:resource-limit-exceeded local-idle-timeout expired (../src/server.c:1046)
2022-03-04 20:57:27.187091 +0000 PROTOCOL (trace) [C6]:FRAME:   -> EOS (../src/server.c:110)
2022-03-04 20:57:27.187128 +0000 CONTAINER (debug) Aborting link '76c34267-770e-4fe2-b922-01acbd802824-$management' due to parent connection end (../src/container.c:325)
2022-03-04 20:57:27.187189 +0000 CONTAINER (debug) Aborting link '76c34267-770e-4fe2-b922-01acbd802824-df0fbdae-648a-4b11-a357-f310eb8df2e2' due to parent connection end (../src/container.c:325)
2022-03-04 20:57:27.187290 +0000 POLICY (debug) [C6] Connection '127.0.0.1:32860' closed with resources n_sessions=1, n_senders=1, n_receivers=1, sessions_denied=0, senders_denied=0, receivers_denied=0, max_message_size_denied:0, nConnections= 3. (../src/policy.c:319)

multicast/e80: test_80 test error: Timeout Expired net_stable:True sent=10 rcvd=[0, 0, 0, 10] rejected=2 aborted=[0, 2, 0, 0] connection_error:0 send_settled:10

https://github.com/jiridanek/skupper-router/runs/5428035645?check_suite_focus=true#step:9:5382

The test code assumes one reject, or up-to-one reject, depending on the specific test

https://github.com/skupperproject/skupper-router/blob/454942e3ee4ba28338d84782a6cee75ce3b9846d/tests/system_tests_policy_oversize_compound.py#L551-L583

I tried adjusting this to tolerate two rejects, because that is what I am often seeing when running tests with -j100. Then, I got

23: DEBUG    root:system_test.py:1439 2022-03-05 11:46:29.951349 OversizeMulticastTransferTest - multicast/e80: FAIL: N closed events in log file did not increment by 1. oBefore: 1, oAfter: 1, iBefore:2, iAfter:4

(from https://github.com/jiridanek/skupper-router/runs/5432676676?check_suite_focus=true#step:9:4004)

jiridanek commented 2 years ago

23: DEBUG root:system_test.py:1439 2022-03-05 11:46:29.951349 OversizeMulticastTransferTest - multicast/e80: FAIL: N closed events in log file did not increment by 1. oBefore: 1, oAfter: 1, iBefore:2, iAfter:4

[before:]

2022-03-05 11:46:14.293825 +0000 PROTOCOL (trace) [C1]:FRAME: 0 <- @close(24) [error=@error(29) [condition=:"amqp:connection:forced", description="Message size exceeded"]] (../src/server.c:110)
2022-03-05 11:46:18.765349 +0000 PROTOCOL (trace) [C6]:FRAME: 0 <- @close(24) [error=@error(29) [condition=:"amqp:connection:forced", description="Message size exceeded"]] (../src/server.c:110)
2022-03-05 11:46:22.783473 +0000 PROTOCOL (trace) [C9]:FRAME: 0 -> @close(24) [error=@error(29) [condition=:"amqp:connection:forced", description="Message size exceeded"]] (../src/server.c:110)

[after:]

2022-03-05 11:46:26.544117 +0000 PROTOCOL (trace) [C8]:FRAME: 0 <- @close(24) [error=@error(29) [condition=:"amqp:connection:forced", description="Message size exceeded"]] (../src/server.c:110)
2022-03-05 11:46:28.884387 +0000 PROTOCOL (trace) [C12]:FRAME: 0 <- @close(24) [error=@error(29) [condition=:"amqp:connection:forced", description="Message size exceeded"]] (../src/server.c:110)

The test expected only one such line in the "After:", that is, log lines added between start of test and the final evaluation.

jiridanek commented 2 years ago

23: DEBUG root:system_test.py:1439 2022-03-05 13:24:38.692392 OversizeMessageTransferTest - e60: test_60 test error: Timeout Expired check_done: sent=10 rcvd=0 rejected=2 aborted=1 connection_error:0 send_settled:10

https://github.com/jiridanek/skupper-router/runs/5433073293?check_suite_focus=true#step:9:7705

Sometimes, the messages just don't go through.

jiridanek commented 2 years ago

Duplicated by https://github.com/skupperproject/skupper-router/issues/146, related to https://github.com/skupperproject/skupper-router/issues/168

jiridanek commented 2 years ago

See also https://github.com/skupperproject/skupper-router/issues/618.