maekitalo / tntnet

GNU Lesser General Public License v2.1
74 stars 35 forks source link

tntnet.worker requesttime exceeded #63

Closed jm130794 closed 5 years ago

jm130794 commented 5 years ago

Hello,

We regularly encounter a problem with unsuccessful Ajax requests in an application using tntnet. For example on our component Login.ecpp. This component does not do any complex actions.

tntnet.worker returns this message to us:

FATAL tntnet.worker - requesttime 600s seconds in thread 139743169820416 exceeded - exit process

tntnet is completely blocked for a while. If we wait, everything works again.

We used the latest tntnet.

How do we debug this problem?

jm130794 commented 5 years ago

We tried to use gdb but when the error occurs, the server is dead. Here are the gdb results:

[New Thread 0x7fffe4ff9700 (LWP 23424)] [Thread 0x7fffe4ff9700 (LWP 23424) exited] 2018-11-19 15:41:53.06540 [12392.140737172240128] INFO tntnet.worker - request GET /tntnet/departments/ ready, returncode 200 OK - ContentSize: 1252 2018-11-19 15:41:53.06547 [12392.140737172240128] DEBUG tntnet.worker - reply sent 2018-11-19 15:41:53.06548 [12392.140737172240128] DEBUG tntnet.worker - log request to access log with return code 200 2018-11-19 15:41:53.06551 [12392.140737172240128] DEBUG tntnet.worker - keep alive 2018-11-19 15:41:53.06561 [12392.140737172240128] DEBUG tntnet.worker - eof 2018-11-19 15:41:53.07128 [12392.140737163847424] INFO tntnet.worker - request GET /tntnet/departments/ from client 194.57.88.134 user-Agent "curl/7.47.0" user "" [New Thread 0x7fffe4ff9700 (LWP 23426)] [Thread 0x7fffe4ff9700 (LWP 23426) exited] 2018-11-19 15:41:53.09586 [12392.140737163847424] INFO tntnet.worker - request GET /tntnet/departments/ ready, returncode 200 OK - ContentSize: 1252 2018-11-19 15:41:53.09593 [12392.140737163847424] DEBUG tntnet.worker - reply sent 2018-11-19 15:41:53.09594 [12392.140737163847424] DEBUG tntnet.worker - log request to access log with return code 200 2018-11-19 15:41:53.09597 [12392.140737163847424] DEBUG tntnet.worker - keep alive 2018-11-19 15:41:53.09599 [12392.140737163847424] DEBUG tntnet.worker - eof 2018-11-19 15:41:55.00013 [12392.140735902885632] FATAL tntnet.worker - requesttime 600s seconds in thread 140737085699840 exceeded - exit process 2018-11-19 15:41:55.00015 [12392.140735902885632] INFO tntnet.worker - current state: 5 processing request [Thread 0x7fff89ffb700 (LWP 3106) exited] [Thread 0x7fffa0ff9700 (LWP 996) exited] [Thread 0x7fffc2ffd700 (LWP 781) exited] [Thread 0x7fff8a7fc700 (LWP 22973) exited] [Thread 0x7fff8affd700 (LWP 17822) exited] [Thread 0x7fffa17fa700 (LWP 12425) exited] [Thread 0x7fffa27fc700 (LWP 12423) exited] [Thread 0x7fffa2ffd700 (LWP 12422) exited] [Thread 0x7fffa37fe700 (LWP 12421) exited] [Thread 0x7fffa3fff700 (LWP 12420) exited] [Thread 0x7fffc0ff9700 (LWP 12419) exited] [Thread 0x7fffc17fa700 (LWP 12418) exited] [Thread 0x7fffc1ffb700 (LWP 12417) exited] [Thread 0x7fffc27fc700 (LWP 12416) exited] [Thread 0x7fffc37fe700 (LWP 12414) exited] [Thread 0x7fffa1ffb700 (LWP 12424) exited] [Thread 0x7fffc3fff700 (LWP 12413) exited] [Thread 0x7fffe57fa700 (LWP 12411) exited] [Thread 0x7fffe67fc700 (LWP 12409) exited] [Thread 0x7fffe6ffd700 (LWP 12408) exited] [Thread 0x7fffe77fe700 (LWP 12407) exited] [Thread 0x7fffe7fff700 (LWP 12405) exited] [Thread 0x7fffed287700 (LWP 12401) exited] [Thread 0x7ffff7fb57c0 (LWP 12392) exited] [Inferior 1 (process 12392) exited with code 0157] (gdb) info thread No threads.

We used curl with this command :

for i in seq 1 10000; do echo $i; echo ""; curl -X GET http://194.57.88.134:16000/tntnet/departments/; done

Thanks

Jean-Marc Choulet

misterk72 commented 5 years ago

Hello Jean-Marc, is it possible to share more info about what your Login.ecpp does? It looks like the tntnet workers get stuck into this module which leads to the crash of tntnet after 600s. Cheers, Christophe

jm130794 commented 5 years ago

Hello,

<%pre>
#include <boost/algorithm/string.hpp>
#include <cxxtools/base64codec.h>
#include <fmt/format.h>
#include <json.hpp>

#include "Configuration.h"
#include "entetes_tntdb.h"
#include "ServicesUniweb2.h"
#include "Session.h"
#include "Debug.h"
</%pre>
\
<%session scope="shared">
std::string User;
std::string Password;
std::string Role;
</%session>
\
<%args>
std::string user;
std::string password;
</%args>
\
<%cpp>
nlohmann::json jsonResult = R"({ "success": false, "msg": "Error" })"_json; 

try
{
    Configuration& conf = Configuration::it();
    std::string version = request.getHeader("App-Version:");

    auto conn = tntdb::connect(Configuration::it().dbUrl());

    std::string q = "SELECT ...";

    auto stmt   = conn.prepare(q);
    auto result = stmt.setString("version", version).select();

    if ((result.size() == 1) && 
        (user == "administrator") && (password == Configuration::it().adminPasswd()))
    {
        User = user;
        Password = password;
        Role = "administrator";

        jsonResult["user"] = User;
        jsonResult["role"] = Role;
        jsonResult["mail"] = "toto@foo.fr";
        jsonResult["token"] = cxxtools::Base64Codec::encode(User + ":" + Password);

        jsonResult["success"] = true;
        jsonResult["msg"] = "OK";
    }
}
catch (const std::exception& e)
{
    jsonResult["msg"] = e.what();
    log_error(e.what());
}

reply.out() << jsonResult.dump();
</%cpp>
jm130794 commented 5 years ago

Here are some information obtained with gdb. The tntnet server is no longer responding. :

[Thread 0x7f3048ac2700 (LWP 2280) exited]
[New Thread 0x7f3048ac2700 (LWP 2281)]
[Thread 0x7f3048ac2700 (LWP 2281) exited]
[New Thread 0x7f3048ac2700 (LWP 2282)]
[Thread 0x7f3048ac2700 (LWP 2282) exited]
[New Thread 0x7f3048ac2700 (LWP 2283)]
[Thread 0x7f3048ac2700 (LWP 2283) exited]
^C
Thread 1 "publiweb-webapp" received signal SIGINT, Interrupt.
pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185     in ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S
(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0x7f3057a7a7c0 (LWP 2045) "publiweb-webapp" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  2    Thread 0x7f304cd3f700 (LWP 2046) "publiweb-webapp" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  3    Thread 0x7f304c53e700 (LWP 2047) "publiweb-webapp" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  4    Thread 0x7f304bd3d700 (LWP 2048) "publiweb-webapp" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  5    Thread 0x7f304b53c700 (LWP 2049) "publiweb-webapp" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  6    Thread 0x7f304ad3b700 (LWP 2050) "publiweb-webapp" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  7    Thread 0x7f304a53a700 (LWP 2051) "publiweb-webapp" 0x00007f3053ebfa13 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
  8    Thread 0x7f3049d39700 (LWP 2052) "publiweb-webapp" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
(gdb) thread 8
[Switching to thread 8 (Thread 0x7f3049d39700 (LWP 2052))]
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
225     ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: No such file or directory.
(gdb) backtrace
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007f305714d43f in cxxtools::ConditionImpl::wait(cxxtools::Mutex&, cxxtools::Timespan const&) () from /usr/local/lib/libcxxtools.so.10
#2  0x00007f3056e86cd2 in tnt::TntnetImpl::timerTask() () from /usr/local/lib/libtntnet.so.13
#3  0x00007f30571b3484 in ?? () from /usr/local/lib/libcxxtools.so.10
#4  0x00007f3053ba26ba in start_thread (arg=0x7f3049d39700) at pthread_create.c:333
#5  0x00007f3053ebf41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) info threads
  Id   Target Id         Frame
  1    Thread 0x7f3057a7a7c0 (LWP 2045) "publiweb-webapp" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  2    Thread 0x7f304cd3f700 (LWP 2046) "publiweb-webapp" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  3    Thread 0x7f304c53e700 (LWP 2047) "publiweb-webapp" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  4    Thread 0x7f304bd3d700 (LWP 2048) "publiweb-webapp" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  5    Thread 0x7f304b53c700 (LWP 2049) "publiweb-webapp" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  6    Thread 0x7f304ad3b700 (LWP 2050) "publiweb-webapp" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  7    Thread 0x7f304a53a700 (LWP 2051) "publiweb-webapp" 0x00007f3053ebfa13 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
* 8    Thread 0x7f3049d39700 (LWP 2052) "publiweb-webapp" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f304cd3f700 (LWP 2046))]
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135     ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) backtrace
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f3053ba4dbd in __GI___pthread_mutex_lock (mutex=0x7f30441be518) at ../nptl/pthread_mutex_lock.c:80
#2  0x00007f3054a44ca0 in ?? () from /usr/lib/x86_64-linux-gnu/libpq.so.5
#3  0x00007f3055f2dea3 in X509_STORE_add_cert () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#4  0x00007f3055f2fc39 in X509_load_cert_crl_file () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#5  0x00007f3055f2fd1a in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#6  0x00007f3055f261cf in X509_STORE_load_locations () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#7  0x00007f3052eb9dc5 in ?? () from /usr/lib/x86_64-linux-gnu/libcurl.so.4
#8  0x00007f3052ebc040 in ?? () from /usr/lib/x86_64-linux-gnu/libcurl.so.4
#9  0x00007f3052e789dd in ?? () from /usr/lib/x86_64-linux-gnu/libcurl.so.4
#10 0x00007f3052e89bd1 in ?? () from /usr/lib/x86_64-linux-gnu/libcurl.so.4
#11 0x00007f3052e9e35e in ?? () from /usr/lib/x86_64-linux-gnu/libcurl.so.4
#12 0x00007f3052e9ef5d in curl_multi_perform () from /usr/lib/x86_64-linux-gnu/libcurl.so.4
#13 0x00007f3052e95a0b in curl_easy_perform () from /usr/lib/x86_64-linux-gnu/libcurl.so.4
#14 0x00007f305642267f in RestClient::Connection::performCurlRequest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /usr/local/lib/librestclient-cpp.so.1
#15 0x00007f3056422b3d in RestClient::Connection::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/local/lib/librestclient-cpp.so.1
#16 0x000000000075e2b0 in ?? ()
#17 0x000000000075e650 in ?? ()
#18 0x000000000062619c in ?? ()
#19 0x00007f3056e8e734 in tnt::Worker::dispatch(tnt::HttpRequest&, tnt::HttpReply&) () from /usr/local/lib/libtntnet.so.13
#20 0x00007f3056e8fa4c in tnt::Worker::processRequest(tnt::HttpRequest&, std::iostream&, unsigned int) () from /usr/local/lib/libtntnet.so.13
#21 0x00007f3056e906d5 in tnt::Worker::run() () from /usr/local/lib/libtntnet.so.13
#22 0x00007f3056e9123a in cxxtools::DetachedThread::exec() () from /usr/local/lib/libtntnet.so.13
#23 0x00007f30571b3484 in ?? () from /usr/local/lib/libcxxtools.so.10
#24 0x00007f3053ba26ba in start_thread (arg=0x7f304cd3f700) at pthread_create.c:333
#25 0x00007f3053ebf41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
maekitalo commented 5 years ago

There may be a problem with openssl. There is a function SSL_library_init, which should be called exactly once in a application. Tntnet does it. And you use curl. I guess curl does it too. I see, that the thread 2 is locked in ssl. I don't know, if it is possible to use libcurl in a tntnet application. Note that there is a http client in cxxtools too. In the github version it is able to use ssl and shares the SSL_library_init with tntnet since tntnet uses the ssl implementation of cxxtools.

jm130794 commented 5 years ago

Thank you for the answer.

We need to be able to use curl and openssl for now. Which versions of cxxtools and tntnet should we use?

cxxtools-? tntnet-? tntdb-?

We will test cxxtools::http::Client later.

Jean-Marc

jm130794 commented 5 years ago

We are testing the http client and we have a question:

How to send data in the body of a Post request?

Jean-Marc

jm130794 commented 5 years ago

Is it possible to disable SSL for cxxtools, tntnet and tntdb? We tried on cxxtools with the option --with-ssl=no but we get a compilation error. We would like to disable SSL to have time to test the HTTP client.

Jean-Marc

maekitalo commented 5 years ago

This is how to send data of a post request

    // client to localhost:8000
    cxxtools::http::Client client("", 8000);

    // create request object
    cxxtools::http::Request request("/foo");
    request.method("POST");
    // std::ostream to request body:
    request.body() << "this is the request body";

    // send request and read reply including body
    client.execute(request);
    client.readBody();

    // and fetch the body
    std::string replyBody = client.body();

To disable ssl for cxxtools pass --without-ssl to configure of cxxtools. This disables ssl for tntnet too. But wait until I have fixed the compile error;-). You are indeed right, that it is currently broken.

Tommi

maekitalo commented 5 years ago

... and now fixed. --without-ssl compiles now.

jm130794 commented 5 years ago

We have solved our problem:

Jean-Marc