Closed gabordozsa closed 2 weeks ago
Also, this PR needs to include a system test using vanflow - The test must start two routers, create a bunch of connections from a tcpListener on one router and then delete the tcpListener and validate that the associated connections are gone, same with tcpConnector (on the other router). The checking for the presence and deletion of a tcpConnection can be done using the vanflow API.
Also, this PR needs to include a system test using vanflow - The test must start two routers, create a bunch of connections from a tcpListener on one router and then delete the tcpListener and validate that the associated connections are gone, same with tcpConnector (on the other router). The checking for the presence and deletion of a tcpConnection can be done using the vanflow API.
I'm working on this system test now (so I converted the PR to draft for now). Could you point me to an existing system test which could be a good starting point for me for this new test?
Also, this PR needs to include a system test using vanflow - The test must start two routers, create a bunch of connections from a tcpListener on one router and then delete the tcpListener and validate that the associated connections are gone, same with tcpConnector (on the other router). The checking for the presence and deletion of a tcpConnection can be done using the vanflow API.
I'm working on this system test now (so I converted the PR to draft for now). Could you point me to an existing system test which could be a good starting point for me for this new test?
Please take a look at https://github.com/skupperproject/skupper-router/pull/1566 and see if you understand how the vanflow_snooper.py works and let me know if you have any questions
@gabordozsa Just wanted to mention that if you are working on a github issue, please make sure you assign the issue to yourself before you start working on it. That will let other developers know who is responsible for this issue, so they will not work on that issue. Also if the issue is already assigned to someone, and you want to work on it, please contact that person to see if they are not already working on it, if not, you can that it be assigned to you. Thanks.
@gabordozsa Just wanted to mention that if you are working on a github issue, please make sure you assign the issue to yourself before you start working on it. That will let other developers know who is responsible for this issue, so they will not work on that issue. Also if the issue is already assigned to someone, and you want to work on it, please contact that person to see if they are not already working on it, if not, you can that it be assigned to you. Thanks.
Sure, I will do! Thanks for the reminder and for assigning 1563 to me!
Attention: Patch coverage is 95.58824%
with 9 lines
in your changes missing coverage. Please review.
Project coverage is 62.5%. Comparing base (
1b1d4b9
) to head (35d4d8b
). Report is 34 commits behind head on main.
@ganeshmurthy I have added a system test. I think the PR is ready for more reviews.
This test must also include another scenario where a TLS enabled tcpListener
and TLS enabled tcpConnector
are deleted. You can attach an sslProfile
to a tcpListener
or a tcpConnector
to make them do TLS. Examples of this can be found in system_tests_tcp_adaptor_tls
To test the TLS case, you can use the EchoClientRunner
to start the echo client which can be found in system_tests_tcp_adaptor
. It has the test_ssl
flag which can be set to True
@gabordozsa I introduced a commit - https://github.com/skupperproject/skupper-router/commit/f1f61ee2f5cbdc41e16d45f755eec26f5125000c - where I renamed the QdManager
class to SkManager
so it is more in line with the actual name of the tool skmanage
Please rebase with the latest main branch and use the SkManager class in your PR
@gabordozsa This looks good to me, great work ! I want @kgiusti to look at this and once he approves it, we can push this PR to main branch. He is back on Monday !
@gabordozsa This looks good to me, great work ! I want @kgiusti to look at this and once he approves it, we can push this PR to main branch. He is back on Monday !
@ganeshmurthy Thanks! I pushed another commit to take the listener/connector lock when traversing the connections list. I think this is necessary because the list might be modified by another thread if a new connection is created or an existing one is closed. Does that make sense?
Does that make sense?
Yes. Excellent catch.
Re-asking a question here from https://github.com/skupperproject/skupper/pull/1558#issuecomment-2260242047:
Is there a reason for having this defined per connector/listener rather than either being simply the default behaviour or globally configured?
Re-asking a question here from skupperproject/skupper#1558 (comment):
Is there a reason for having this defined per connector/listener rather than either being simply the default behaviour or globally configured?
My understanding is that we should keep the current default behaviour for v1. As for using a global config flag, the reasoning was that it would not simplify the implementation.
Is there a reason for having this defined per connector/listener rather than either being simply the default behaviour or globally configured?
@grs I agree. I think we can move the closeConnectionsOnDelete flag to the global router
entity instead of having it in each tcpConnector/tcpListener
. Is that what you had in mind ?
@grs I agree. I think we can move the closeConnectionsOnDelete flag to the global
router
entity instead of having it in eachtcpConnector/tcpListener
. Is that what you had in mind ?
That seems simpler for users who want to alter the default behaviour.
@grs I agree. I think we can move the closeConnectionsOnDelete flag to the global
router
entity instead of having it in eachtcpConnector/tcpListener
. Is that what you had in mind ?That seems simpler for users who want to alter the default behaviour.
Agree ! I think we should modify this PR to move the flag to the router
entity.
@gabordozsa @kgiusti What do you think ?
@grs I agree. I think we can move the closeConnectionsOnDelete flag to the global
router
entity instead of having it in eachtcpConnector/tcpListener
. Is that what you had in mind ?That seems simpler for users who want to alter the default behaviour.
Agree ! I think we should modify this PR to move the flag to the
router
entity. @gabordozsa @kgiusti What do you think ?
I'm fine with that. The flag may look at bit out of place in the 'router' entity because it is only relevant to tcpListeners and tcpConnectors. But I agree that would make easier to change the default behaviour.
@grs I agree. I think we can move the closeConnectionsOnDelete flag to the global
router
entity instead of having it in eachtcpConnector/tcpListener
. Is that what you had in mind ?That seems simpler for users who want to alter the default behaviour.
Agree ! I think we should modify this PR to move the flag to the
router
entity. @gabordozsa @kgiusti What do you think ?I'm fine with that. The flag may look at bit out of place in the 'router' entity because it is only relevant to tcpListeners and tcpConnectors. But I agree that would make easier to change the default behaviour.
@gabordozsa There are already some fields in the router
entity namely defaultDistribution
, dataConnectionCount
etc that are not directly pertinent to the router
itself. We have used the router
entity in the past to add attributes that apply to the router globally.
@ganeshmurthy I moved the new config flag into the 'router' entity and adjusted the system tests.
I renamed the new config flag to dropTcpConnections
to be aligned with https://github.com/skupperproject/skupper/pull/1564
I see this CI error -
/home/runner/work/skupper-router/skupper-router/skupper-router/tests/system_test.py:1288: RuntimeError
---------------------- generated xml file: /__w/skupper-router/skupper-router/skupper-router/build/tests/junitxmls/system_tests_grpc.xml -----------------------
=================================================================== short test summary info ====================================================================
ERROR ::GrpcServiceMethodsTestOverTcpTls::test_grpc_04_client_stream - RuntimeError: Errors during teardown:
Process 3933 (name=grpc-test-router) error: returned error code -11
skrouterd -c grpc-test-router.conf -I /home/runner/work/skupper-router/skupper-router/skupper-router/python
/__w/skupper-router/skupper-router/skupper-router/build/tests/system_test.dir/tests/system_tests_grpc/GrpcServiceMethodsTestOverTcpTls/setUpClass/grpc-test-router-5.cmd
>>>>
/home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/connections.c:338:5: runtime error: member access within null pointer of type 'struct qdr_connection_t'
#0 0x66813f in qdr_core_close_connection /home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/connections.c:338
#1 0x5146f6 in qd_dispatch_delete_tcp_listener /home/runner/work/skupper-router/skupper-router/skupper-router/src/adaptors/tcp/tcp_adaptor.c:2397
#2 0x7f54d3082055 in ffi_call_unix64 (/lib64/libffi.so.8+0x9055) (BuildId: a190bf03e644181cadab122962ab83ae96271696)
#3 0x7f54d307e69f in ffi_call_int.lto_priv.0 (/lib64/libffi.so.8+0x569f) (BuildId: a190bf03e644181cadab122962ab83ae96271696)
#4 0x7f54d30814ed in ffi_call (/lib64/libffi.so.8+0x84ed) (BuildId: a190bf03e644181cadab122962ab83ae96271696)
#5 0x7f54d30992a7 in _ctypes_callproc (/usr/lib64/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so+0x102a7) (BuildId: e105d1b113f81c5a55ea7e4b1def4168d56667da)
#6 0x7f54d3098c80 in PyCFuncPtr_call (/usr/lib64/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so+0xfc80) (BuildId: e105d1b113f81c5a55ea7e4b1def4168d56667da)
#7 0x7f54d555a972 in _PyObject_MakeTpCall (/lib64/libpython3.11.so.1.0+0x131972) (BuildId: dc5b8f1564a7d29af984bfb632f884ae4c52988d)
#8 0x7f54d55630fe in _PyEval_EvalFrameDefault (/lib64/libpython3.11.so.1.0+0x13a0fe) (BuildId: dc5b8f1564a7d29af984bfb632f884ae4c52988d)
#9 0x7f54d555f44e in _PyEval_Vector (/lib64/libpython3.11.so.1.0+0x13644e) (BuildId: dc5b8f1564a7d29af984bfb632f884ae4c52988d)
#10 0x7f54d5597a8a in method_vectorcall (/lib64/libpython3.11.so.1.0+0x16ea8a) (BuildId: dc5b8f1564a7d29af984bfb632f884ae4c52988d)
#11 0x7f54d5557ba6 in _PyObject_CallFunctionVa (/lib64/libpython3.11.so.1.0+0x12eba6) (BuildId: dc5b8f1564a7d29af984bfb632f884ae4c52988d)
#12 0x7f54d5557adf in PyObject_CallFunction (/lib64/libpython3.11.so.1.0+0x12eadf) (BuildId: dc5b8f1564a7d29af984bfb632f884ae4c52988d)
#13 0x6165c5 in qd_io_rx_handler /home/runner/work/skupper-router/skupper-router/skupper-router/src/python_embedded.c:658
#14 0x6bfbd2 in qdr_forward_on_message /home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/forwarder.c:375
#15 0x6eeac0 in qdr_general_handler /home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/router_core.c:1031
#16 0x7572e2 in qd_timer_visit /home/runner/work/skupper-router/skupper-router/skupper-router/src/timer.c:317
#17 0x752288 in handle_proactor_other_event /home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:142
#18 0x7524bf in proactor_thread /home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:194
#19 0x6124b5 in _thread_init /home/runner/work/skupper-router/skupper-router/skupper-router/src/posix/threading.c:207
#20 0x7f54d595af95 in asan_thread_start(void*) (/lib64/libasan.so.8+0x5df95) (BuildId: 79824421bd82bb3ef4addf048e1265e2a93cfc64)
#21 0x7f54d4494506 in start_thread (/lib64/libc.so.6+0x97506) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
#22 0x7f54d451840b in clone3 (/lib64/libc.so.6+0x11b40b) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
@gabordozsa there seem to be some CI failures in this PR like this one - https://github.com/skupperproject/skupper-router/actions/runs/10354198908/job/28658907680?pr=1570
I have kicked off the main branch CI just to make sure that it is clean, let's see what happens on main branch CI.
@gabordozsa The main CI seems fine
I see this in the PR CI - https://github.com/skupperproject/skupper-router/actions/runs/10354198908/job/28658907680?pr=1570#step:39:600
Leak of qd_tls_domain_t
object
https://github.com/skupperproject/skupper-router/actions/runs/10355435503/job/28663084819?pr=1570#step:34:5092
Leak of
qd_tls_domain_t
object https://github.com/skupperproject/skupper-router/actions/runs/10355435503/job/28663084819?pr=1570#step:34:5092
@ganeshmurthy In the same error log I can see (https://github.com/skupperproject/skupper-router/actions/runs/10355435503/job/28663084819?pr=1570#step:34:4777):
alloc.c: Items of type 'qd_tcp_connection_t' remain allocated at shutdown: 1
Leak: 2024-08-12 16:14:00.677874 +0000 type: qd_tcp_connection_t address: 0x5160001f9610
I'm wondering if the problem could be in ADAPTOR_final()
https://github.com/gabordozsa/skupper-router/blob/2c3629b9e17b5a9ce0bb7591d6d9e2dd9b715a52/src/adaptors/tcp/tcp_adaptor.c#L2648. ADAPTOR_final()
iterates through the connections list and trigger closing them. In the main branch every accepted connection is added to the connections list. However, we delay adding it to the list now (until the link is setup). So if a new connection is
accepted but ADAPTOR_final()
is called before the link is setup for it then this conn can leak. Does that make sense?
However, we delay adding it to the list now (until the link is setup). So if a new connection is accepted but
ADAPTOR_final()
is called before the link is setup for it then this conn can leak. Does that make sense?
Good catch @gabordozsa . It looks like the on_accept() was called and the connection was created there but before we could get a PN_RAW_CONNECTION_CONNECTED event from proton, the router seems to be shutting down which leaves the newly created connection (which is not yet in the list) to leak.
If you ask me, this should not really be a problem. The router is anyway shutting down, who cares about the leak ? But unfortunately, this is how we do things in the router.
Closing this as code got merged via https://github.com/skupperproject/skupper-router/pull/1601
Fixes #1563: Delete corresponding connections optionally on deletion of tcpListener and tcpConnector
New config parameters added to tcpListener and tcpConnector entities in order to make deletion of connections optional.