Open hyperxor opened 3 years ago
Hello, @iroddis, can you describe the expected scenario of the client_multiple_requests_disconnects_handled
unit test, maybe I can fix it.
I cannot reproduce the failure on my dev box (From right after PR #840 is merged). I built with g++ and clang++, both in "Release" mode and "Debug" mode, and ran all unit tests (all pass), plus my own app runs fine.
All 4 builds are very noisy with compiler warnings (I'd love for those to be cleaned up some day), but all 4 "worked".
hyperxor@, what is your build platform?
$ uname -a
Linux ostara 5.4.72-gentoo-x86_64 #1 SMP PREEMPT Mon Nov 9 22:45:29 PST 2020 x86_64 Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz GenuineIntel GNU/Linux
$ g++ --version
g++ (Gentoo 10.2.0-r3 p4) 10.2.0
$ clang++ --version
clang version 11.0.0
@dennisjenkins75 Ok, thank you, I will see.
$ uname -a
Linux hyperxor-ws 5.4.0-54-generic #60-Ubuntu SMP Fri Nov 6 10:37:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ g++ --version
g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
@dennisjenkins75, try running via autopkgtest. There's notes on how to do this in the readme. It will build and run all unit and CI tests within a container so there are hopefully no side effects spilling in from your system affecting the results.
@dennisjenkins75 @kiplingw I have found tbe same problem with tests here - arm64 build of pistache 0.0.002-202011242302~ubuntu20.04.1 in ubuntu focal RELEASE (link):
[ RUN ] http_server_test.client_multiple_requests_disconnects_handled
Trying to run server...
INITING
Server address: localhost:43605
[server] Sent: 127.0.0.1
[server] Sent: 127.0.0.1
[server] Sent: 127.0.0.1
Response code is OK
Response code is OK
Response code is OK
resolves: 3, rejects: 0, timeout: 1, wait: 6
/<<PKGBUILDDIR>>/tests/http_server_test.cc:475: Failure
Expected equality of these values:
counter
Which is: 0
CLIENT_REQUEST_SIZE
Which is: 3
[ FAILED ] http_server_test.client_multiple_requests_disconnects_handled (1101 ms)
[----------] 8 tests from http_server_test (33791 ms total)
[----------] Global test environment tear-down
[==========] 8 tests from 1 test suite ran. (33852 ms total)
[ PASSED ] 7 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] http_server_test.client_multiple_requests_disconnects_handled
1 FAILED TEST
Thanks @hyperxor. If you join the Launchpad team you'll get notified automatically when builds fail.
@dennisjenkins75 I will close this issue in you don't mind. Fix is working.
Sadly it's still re-appearing in the unstable PPA builder as of today.
...
Trying to run server...
[server] Ininting...
Server address: localhost:38715
[server] Sent: 127.0.0.1
Response code is OK
[server] Sent: 127.0.0.1
Response code is OK
[server] Sent: 127.0.0.1
Response code is OK
resolves: 3, rejects: 0, timeout: 1, wait: 6
[server] Disconnect from peer ID 19 connecting from 127.0.0.1
/<<PKGBUILDDIR>>/tests/http_server_test.cc:503: Failure
Expected equality of these values:
result
Which is: false
true
[ FAILED ] http_server_test.client_multiple_requests_disconnects_handled (2145 ms)
[----------] 8 tests from http_server_test (22064 ms total)
[----------] Global test environment tear-down
[==========] 8 tests from 1 test suite ran. (22114 ms total)
[ PASSED ] 7 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] http_server_test.client_multiple_requests_disconnects_handled
1 FAILED TEST
Start 13: http_client_test
13/26 MemCheck #13: http_client_test ................. Passed 11.94 sec
Start 14: listener_test
...
Failed again today for focal on amd64.
[ RUN ] http_server_test.client_multiple_requests_disconnects_handled
Trying to run server...
[server] Ininting...
Server address: localhost:34997
[server] Sent: 127.0.0.1
Response code is OK
[server] Sent: 127.0.0.1
Response code is OK
[server] Sent: 127.0.0.1
Response code is OK
resolves: 3, rejects: 0, timeout: 1, wait: 6
[server] Disconnect from peer ID 19 connecting from 127.0.0.1
[server] Disconnect from peer ID 20 connecting from 127.0.0.1
/<<PKGBUILDDIR>>/tests/http_server_test.cc:503: Failure
Expected equality of these values:
result
Which is: false
true
[ FAILED ] http_server_test.client_multiple_requests_disconnects_handled (2289 ms)
[----------] 8 tests from http_server_test (23993 ms total)
[----------] Global test environment tear-down
[==========] 8 tests from 1 test suite ran. (24039 ms total)
[ PASSED ] 7 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] http_server_test.client_multiple_requests_disconnects_handled
1 FAILED TEST
More again as of today here.
[ RUN ] http_server_test.client_multiple_requests_disconnects_handled
Trying to run server...
Server address: localhost:34409
[server] [7f2a057d2700] Sent: `127.0.0.1` data to Peer with ID=21 (address=127.0.0.1:13479, hostname=localhost, fd=87)
[server] [7f2a057d2700] Sent: `127.0.0.1` data to Peer with ID=22 (address=127.0.0.1:13991, hostname=localhost, fd=88)
[server] [7f2a057d2700] Sent: `127.0.0.1` data to Peer with ID=23 (address=127.0.0.1:14503, hostname=localhost, fd=89)
[client] [7f29f77fe700] Response code is OK
[client] [7f29f77fe700] Response code is OK
[client] [7f29f77fe700] Response code is OK
[server] [7f2a057d2700] Disconnect from peer Peer with ID=21 (address=127.0.0.1:13479, hostname=localhost, fd=87)
[client] [7f2a067d8740] resolves: 3, rejects: 0, timeout: 1 seconds, wait: 6 seconds
[server] [7f2a057d2700] Disconnect from peer Peer with ID=22 (address=127.0.0.1:13991, hostname=localhost, fd=88)
[server] [7f2a057d2700] Disconnect from peer Peer with ID=23 (address=127.0.0.1:14503, hostname=localhost, fd=89)
[ OK ] http_server_test.client_multiple_requests_disconnects_handled (1 ms)
[----------] 10 tests from http_server_test (20021 ms total)
[----------] Global test environment tear-down
[==========] 10 tests from 1 test suite ran. (20021 ms total)
[ PASSED ] 9 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] http_server_test.multiple_client_with_different_requests_to_multithreaded_server
1 FAILED TEST
--- stderr ---
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
-------
Again just now while testing @jacobbogdanov's #859, but this time locally:
[ RUN ] http_server_test.client_multiple_requests_disconnects_handled
Trying to run server...
Server address: localhost:38193
[server] [7efc0bfff640] Sent: `127.0.0.1` data to Peer with ID=21 (address=127.0.0.1:11422, hostname=localhost, fd=86)
[server] [7efc0bfff640] Sent: `127.0.0.1` data to Peer with ID=22 (address=127.0.0.1:11934, hostname=localhost, fd=87)
[server] [7efc0bfff640] Sent: `127.0.0.1` data to Peer with ID=23 (address=127.0.0.1:12446, hostname=localhost, fd=88)
[client] [7efc28ed9640] Response code is OK
[client] [7efc28ed9640] Response code is OK
[client] [7efc28ed9640] Response code is OK
[client] [7efc2c6e4740] resolves: 3, rejects: 0, timeout: 1 seconds, wait: 6 seconds
[server] [7efc0bfff640] Disconnect from peer Peer with ID=21 (address=127.0.0.1:11422, hostname=localhost, fd=86)
[server] [7efc0bfff640] Disconnect from peer Peer with ID=22 (address=127.0.0.1:11934, hostname=localhost, fd=87)
[server] [7efc0bfff640] Disconnect from peer Peer with ID=23 (address=127.0.0.1:12446, hostname=localhost, fd=88)
[ OK ] http_server_test.client_multiple_requests_disconnects_handled (1 ms)
[----------] 10 tests from http_server_test (32021 ms total)
[----------] Global test environment tear-down
[==========] 10 tests from 1 test suite ran. (32021 ms total)
[ PASSED ] 10 tests.
--- stderr ---
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
-------
Has anyone attempted to reproduce the problem under valgrind, specifically "helgrind" [1]? Or compile the code with TSAN [2] ?
[1] https://valgrind.org/docs/manual/hg-manual.html [2] https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual
I'll try to add support for the thread sanitizer (and also others) in a PR in the following hours
I've ran the tests with the thread sanitizer (with both GCC and Clang) and it reports data races in 19 tests, specifically async_test
, router_test
, cookie_test_3
, http_server_test
, http_client_test
, request_size_test
, streaming_test
, and threadname_test
. The logs are almost 8000 lines long, but if somebody wants to take a look I've uploaded them here (Clang) and here (GCC)
I don't know anything about either of the two tools Dennis recommended. But I would suggest that we eliminate false positives from the log to make things easier.
@kiplingw Hello, can you send build fails (if any) on http_server_test.client_multiple_requests_disconnects_handled
test to this isssue? I think latest changes will help to find root cause of problem.
Will do.
@kiplingw I think this issue can be closed if there are no bug reproductions.
Yep. While the Travis Clang pipeline is broken for no reason (they probably messed up the apt sources), GCC builds seem fine. Good job :)
Yes indeed, well done. Hopefully it's been squashed for good and this doesn't resurrect itself.
Has anyone attempted to reproduce the problem under valgrind, specifically "helgrind" [1]? Or compile the code with TSAN [2] ?
[1] https://valgrind.org/docs/manual/hg-manual.html [2] https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual
I faced this issue quite often. Thread sanitizer report:
WARNING: ThreadSanitizer: data race (pid=3415914)
Write of size 8 at 0x7ba800000468 by thread T3 (mutexes: write M0):
#0 close <null>
#1 Pistache::Http::Experimental::Connection::close() src/client/client.cc:624:9
#2 Pistache::Http::Experimental::ConnectionPool::shutdown() src/client/client.cc:881:27
#3 Pistache::Http::Experimental::Client::shutdown() src/client/client.cc:993:14
#4 clientLogicFunc(unsigned long, std::string const&, int, int) tests/http_server_test.cc:241:12
Previous read of size 8 at 0x7ba800000468 by thread T7:
#0 recv <null>
#1 Pistache::Http::Experimental::Transport::handleIncoming(std::shared_ptr<Pistache::Http::Experimental::Connection>) src/client/client.cc:499:35
#2 Pistache::Http::Experimental::Transport::handleReadableEntry(Pistache::Aio::FdSet::Entry const&) src/client/client.cc:420:17
#3 Pistache::Http::Experimental::Transport::onReady(Pistache::Aio::FdSet const&) src/client/client.cc:255:17
#4 Pistache::Aio::SyncImpl::handleFds(std::vector<Pistache::Polling::Event>) const src/common/reactor.cc:205:34
#5 Pistache::Aio::SyncImpl::runOnce() src/common/reactor.cc:165:21
#6 Pistache::Aio::SyncImpl::run() src/common/reactor.cc:177:17
#7 Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()::operator()() const src/common/reactor.cc:515:27
Hello,
I have run unit tests on the latest souce and got the following error:
It seems it happened after this #838 pull request.