Open jiridanek opened 1 year ago
I tried with the libuv proactor again, with -DPROACTOR=libuv
in Proton. Currently it requires removing the check in Proton CMakeLists which does not allow building the tls lib with anything than epoll (Cannot currently build the raw connection TLS library without the epoll proactor and OpenSSL
). Commenting this out is ok, if the goal is to pass the system_tests_distribution setUp.
This still seems to fix the router network startup, same as I reported in the rr issue:
34: Test command: /home/jdanek/repos/skupper-router/.venv/bin/python "/home/jdanek/repos/skupper-router/cmake-build-relwithdebinfo-rrasan/tests/run.py" "-m" "unittest" "-v" "system_tests_distribution"
34: Working Directory: /home/jdanek/repos/skupper-router/cmake-build-relwithdebinfo-rrasan/tests
34: Test timeout computed to be: 600
34: test_01_targeted_sender_AC (system_tests_distribution.DistributionTests.test_01_targeted_sender_AC) ... ok
34: test_02_targeted_sender_DC (system_tests_distribution.DistributionTests.test_02_targeted_sender_DC) ... ok
34: test_03_anonymous_sender_AC (system_tests_distribution.DistributionTests.test_03_anonymous_sender_AC) ...
[...]
You may notice these errors in the router logs. They are (AFAIK) simply caused by the system_test checking if port is open, and are harmless
2023-04-02 20:32:24.570475 +0200 SERVER (error) [C4] Connection from 127.0.0.1:52974 (to 0.0.0.0:28540) failed: amqp:connection:framing-error No protocol header found (connection aborted) (/home/jdanek/repos/skupper-router/src/server.c:1068)
Use current git tip of skupper-router and qpid-proton.
Configure skrouterd with this option (adjust the path for your checkout, maybe consider adding
--chaos
to therr record
command line)"-DQDROUTERD_RUNNER=/home/jdanek/repos/skupper-router/scripts/sigforwarder.py /usr/bin/rr record --print-trace-dir=1 --continue-through-signal=15"
as described in https://github.com/skupperproject/skupper-router/blob/main/docs/notes/debugging.adoc#rr-workflow
When looking into the logs, I see that router C attempted to create the connections to router B, but router B did not answer
I tried asking in rr issuetracker if they expect any such problems with epoll-based programs and I was told that they are not aware of any, but that the program needs to be able to handle
EINTR
correctlyBack when I filled that rr issue, I also tried with libuv proactor in Proton, and then I did not have this problem.
@ganeshmurthy, @kgiusti, @astitcher, @cliffjansen do you think this might be worth trying to reproduce, if you see the same issue I do? On my laptop, the system_tests_distribution test gets stuck like this pretty much every time. I tried running the whole testsuite under rr, but (some, about 8 of them) the other tests were failing only intermittently.