pistacheio / pistache

A high-performance REST toolkit written in C++
https://pistacheio.github.io/pistache/
Apache License 2.0
3.18k stars 698 forks source link

Unit test client_multiple_requests_disconnects_handled is failed #842

Open hyperxor opened 3 years ago

hyperxor commented 3 years ago

Hello,

I have run unit tests on the latest souce and got the following error:

12: [ RUN      ] http_server_test.client_multiple_requests_disconnects_handled
12: Trying to run server...
12: INITING
12: Server address: localhost:38919
12: [server] Sent: 127.0.0.1
12: [server] Sent: 127.0.0.1
12: [server] Sent: 127.0.0.1
12: Response code is OK
12: Response code is OK
12: Response code is OK
12: resolves: 3, rejects: 0, timeout: 1, wait: 6
12: /home/hyperxor/code/pistache/tests/http_server_test.cc:475: Failure
12: Value of: CLIENT_REQUEST_SIZE
12:   Actual: 3
12: Expected: counter
12: Which is: 0
12: [  FAILED  ] http_server_test.client_multiple_requests_disconnects_handled (1003 ms)

It seems it happened after this #838 pull request.

hyperxor commented 3 years ago

Hello, @iroddis, can you describe the expected scenario of the client_multiple_requests_disconnects_handled unit test, maybe I can fix it.

dennisjenkins75 commented 3 years ago

I cannot reproduce the failure on my dev box (From right after PR #840 is merged). I built with g++ and clang++, both in "Release" mode and "Debug" mode, and ran all unit tests (all pass), plus my own app runs fine.

All 4 builds are very noisy with compiler warnings (I'd love for those to be cleaned up some day), but all 4 "worked".

hyperxor@, what is your build platform?

$ uname -a
Linux ostara 5.4.72-gentoo-x86_64 #1 SMP PREEMPT Mon Nov 9 22:45:29 PST 2020 x86_64 Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz GenuineIntel GNU/Linux

$ g++ --version
g++ (Gentoo 10.2.0-r3 p4) 10.2.0

$ clang++ --version
clang version 11.0.0
hyperxor commented 3 years ago

@dennisjenkins75 Ok, thank you, I will see.

$ uname -a
Linux hyperxor-ws 5.4.0-54-generic #60-Ubuntu SMP Fri Nov 6 10:37:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ g++ --version
g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
kiplingw commented 3 years ago

@dennisjenkins75, try running via autopkgtest. There's notes on how to do this in the readme. It will build and run all unit and CI tests within a container so there are hopefully no side effects spilling in from your system affecting the results.

hyperxor commented 3 years ago

@dennisjenkins75 @kiplingw I have found tbe same problem with tests here - arm64 build of pistache 0.0.002-202011242302~ubuntu20.04.1 in ubuntu focal RELEASE (link):

[ RUN      ] http_server_test.client_multiple_requests_disconnects_handled
Trying to run server...
INITING
Server address: localhost:43605
[server] Sent: 127.0.0.1
[server] Sent: 127.0.0.1
[server] Sent: 127.0.0.1
Response code is OK
Response code is OK
Response code is OK
resolves: 3, rejects: 0, timeout: 1, wait: 6
/<<PKGBUILDDIR>>/tests/http_server_test.cc:475: Failure
Expected equality of these values:
  counter
    Which is: 0
  CLIENT_REQUEST_SIZE
    Which is: 3
[  FAILED  ] http_server_test.client_multiple_requests_disconnects_handled (1101 ms)
[----------] 8 tests from http_server_test (33791 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 1 test suite ran. (33852 ms total)
[  PASSED  ] 7 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] http_server_test.client_multiple_requests_disconnects_handled

 1 FAILED TEST
kiplingw commented 3 years ago

Thanks @hyperxor. If you join the Launchpad team you'll get notified automatically when builds fail.

hyperxor commented 3 years ago

@dennisjenkins75 I will close this issue in you don't mind. Fix is working.

kiplingw commented 3 years ago

Sadly it's still re-appearing in the unstable PPA builder as of today.

...
Trying to run server...
[server] Ininting...
Server address: localhost:38715
[server] Sent: 127.0.0.1
Response code is OK
[server] Sent: 127.0.0.1
Response code is OK
[server] Sent: 127.0.0.1
Response code is OK
resolves: 3, rejects: 0, timeout: 1, wait: 6
[server] Disconnect from peer ID 19 connecting from 127.0.0.1
/<<PKGBUILDDIR>>/tests/http_server_test.cc:503: Failure
Expected equality of these values:
  result
    Which is: false
  true
[  FAILED  ] http_server_test.client_multiple_requests_disconnects_handled (2145 ms)
[----------] 8 tests from http_server_test (22064 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 1 test suite ran. (22114 ms total)
[  PASSED  ] 7 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] http_server_test.client_multiple_requests_disconnects_handled

 1 FAILED TEST

      Start 13: http_client_test
13/26 MemCheck #13: http_client_test .................   Passed   11.94 sec
      Start 14: listener_test
...
kiplingw commented 3 years ago

Failed again today for focal on amd64.

[ RUN      ] http_server_test.client_multiple_requests_disconnects_handled
Trying to run server...
[server] Ininting...
Server address: localhost:34997
[server] Sent: 127.0.0.1
Response code is OK
[server] Sent: 127.0.0.1
Response code is OK
[server] Sent: 127.0.0.1
Response code is OK
resolves: 3, rejects: 0, timeout: 1, wait: 6
[server] Disconnect from peer ID 19 connecting from 127.0.0.1
[server] Disconnect from peer ID 20 connecting from 127.0.0.1
/<<PKGBUILDDIR>>/tests/http_server_test.cc:503: Failure
Expected equality of these values:
  result
    Which is: false
  true
[  FAILED  ] http_server_test.client_multiple_requests_disconnects_handled (2289 ms)
[----------] 8 tests from http_server_test (23993 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 1 test suite ran. (24039 ms total)
[  PASSED  ] 7 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] http_server_test.client_multiple_requests_disconnects_handled

 1 FAILED TEST
kiplingw commented 3 years ago

More again as of today here.

[ RUN      ] http_server_test.client_multiple_requests_disconnects_handled
Trying to run server...
Server address: localhost:34409
[server] [7f2a057d2700] Sent: `127.0.0.1` data to Peer with ID=21 (address=127.0.0.1:13479, hostname=localhost, fd=87)
[server] [7f2a057d2700] Sent: `127.0.0.1` data to Peer with ID=22 (address=127.0.0.1:13991, hostname=localhost, fd=88)
[server] [7f2a057d2700] Sent: `127.0.0.1` data to Peer with ID=23 (address=127.0.0.1:14503, hostname=localhost, fd=89)
[client] [7f29f77fe700] Response code is OK
[client] [7f29f77fe700] Response code is OK
[client] [7f29f77fe700] Response code is OK
[server] [7f2a057d2700] Disconnect from peer Peer with ID=21 (address=127.0.0.1:13479, hostname=localhost, fd=87)
[client] [7f2a067d8740] resolves: 3, rejects: 0, timeout: 1 seconds, wait: 6 seconds
[server] [7f2a057d2700] Disconnect from peer Peer with ID=22 (address=127.0.0.1:13991, hostname=localhost, fd=88)
[server] [7f2a057d2700] Disconnect from peer Peer with ID=23 (address=127.0.0.1:14503, hostname=localhost, fd=89)
[       OK ] http_server_test.client_multiple_requests_disconnects_handled (1 ms)
[----------] 10 tests from http_server_test (20021 ms total)

[----------] Global test environment tear-down
[==========] 10 tests from 1 test suite ran. (20021 ms total)
[  PASSED  ] 9 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] http_server_test.multiple_client_with_different_requests_to_multithreaded_server

 1 FAILED TEST
--- stderr ---
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
-------
kiplingw commented 3 years ago

Again just now while testing @jacobbogdanov's #859, but this time locally:

[ RUN      ] http_server_test.client_multiple_requests_disconnects_handled
Trying to run server...
Server address: localhost:38193
[server] [7efc0bfff640] Sent: `127.0.0.1` data to Peer with ID=21 (address=127.0.0.1:11422, hostname=localhost, fd=86)
[server] [7efc0bfff640] Sent: `127.0.0.1` data to Peer with ID=22 (address=127.0.0.1:11934, hostname=localhost, fd=87)
[server] [7efc0bfff640] Sent: `127.0.0.1` data to Peer with ID=23 (address=127.0.0.1:12446, hostname=localhost, fd=88)
[client] [7efc28ed9640] Response code is OK
[client] [7efc28ed9640] Response code is OK
[client] [7efc28ed9640] Response code is OK
[client] [7efc2c6e4740] resolves: 3, rejects: 0, timeout: 1 seconds, wait: 6 seconds
[server] [7efc0bfff640] Disconnect from peer Peer with ID=21 (address=127.0.0.1:11422, hostname=localhost, fd=86)
[server] [7efc0bfff640] Disconnect from peer Peer with ID=22 (address=127.0.0.1:11934, hostname=localhost, fd=87)
[server] [7efc0bfff640] Disconnect from peer Peer with ID=23 (address=127.0.0.1:12446, hostname=localhost, fd=88)
[       OK ] http_server_test.client_multiple_requests_disconnects_handled (1 ms)
[----------] 10 tests from http_server_test (32021 ms total)

[----------] Global test environment tear-down
[==========] 10 tests from 1 test suite ran. (32021 ms total)
[  PASSED  ] 10 tests.
--- stderr ---
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
-------
dennisjenkins75 commented 3 years ago

Has anyone attempted to reproduce the problem under valgrind, specifically "helgrind" [1]? Or compile the code with TSAN [2] ?

[1] https://valgrind.org/docs/manual/hg-manual.html [2] https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual

Tachi107 commented 3 years ago

I'll try to add support for the thread sanitizer (and also others) in a PR in the following hours

Tachi107 commented 3 years ago

I've ran the tests with the thread sanitizer (with both GCC and Clang) and it reports data races in 19 tests, specifically async_test, router_test, cookie_test_3, http_server_test, http_client_test, request_size_test, streaming_test, and threadname_test. The logs are almost 8000 lines long, but if somebody wants to take a look I've uploaded them here (Clang) and here (GCC)

kiplingw commented 3 years ago

I don't know anything about either of the two tools Dennis recommended. But I would suggest that we eliminate false positives from the log to make things easier.

hyperxor commented 3 years ago

@kiplingw Hello, can you send build fails (if any) on http_server_test.client_multiple_requests_disconnects_handled test to this isssue? I think latest changes will help to find root cause of problem.

Tachi107 commented 3 years ago

You can find a lot of failures on Travis. Some of the most recent are these builds: 767081020, 767090930, 767090938

kiplingw commented 3 years ago

Will do.

hyperxor commented 3 years ago

@kiplingw I think this issue can be closed if there are no bug reproductions.

Tachi107 commented 3 years ago

Yep. While the Travis Clang pipeline is broken for no reason (they probably messed up the apt sources), GCC builds seem fine. Good job :)

kiplingw commented 3 years ago

Yes indeed, well done. Hopefully it's been squashed for good and this doesn't resurrect itself.

kiplingw commented 3 years ago

I swear this bug is cursed. Happening again on armhf.

tyler92 commented 11 months ago

Has anyone attempted to reproduce the problem under valgrind, specifically "helgrind" [1]? Or compile the code with TSAN [2] ?

[1] https://valgrind.org/docs/manual/hg-manual.html [2] https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual

I faced this issue quite often. Thread sanitizer report:

WARNING: ThreadSanitizer: data race (pid=3415914)
  Write of size 8 at 0x7ba800000468 by thread T3 (mutexes: write M0):
    #0 close <null> 
    #1 Pistache::Http::Experimental::Connection::close() src/client/client.cc:624:9 
    #2 Pistache::Http::Experimental::ConnectionPool::shutdown() src/client/client.cc:881:27 
    #3 Pistache::Http::Experimental::Client::shutdown() src/client/client.cc:993:14 
    #4 clientLogicFunc(unsigned long, std::string const&, int, int) tests/http_server_test.cc:241:12 

  Previous read of size 8 at 0x7ba800000468 by thread T7:
    #0 recv <null> 
    #1 Pistache::Http::Experimental::Transport::handleIncoming(std::shared_ptr<Pistache::Http::Experimental::Connection>) src/client/client.cc:499:35 
    #2 Pistache::Http::Experimental::Transport::handleReadableEntry(Pistache::Aio::FdSet::Entry const&) src/client/client.cc:420:17 
    #3 Pistache::Http::Experimental::Transport::onReady(Pistache::Aio::FdSet const&) src/client/client.cc:255:17 
    #4 Pistache::Aio::SyncImpl::handleFds(std::vector<Pistache::Polling::Event>) const src/common/reactor.cc:205:34 
    #5 Pistache::Aio::SyncImpl::runOnce() src/common/reactor.cc:165:21 
    #6 Pistache::Aio::SyncImpl::run() src/common/reactor.cc:177:17 
    #7 Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()::operator()() const src/common/reactor.cc:515:27