Closed kzangeli closed 7 years ago
We should take into account not only MHD but also libcurl
Maybe the usage of -maxConnection
CLI should be changed after implementing this issue. The following thread provides extra information regarding the MHD_OPTION_CONNECTION_LIMIT parameter in which -maxConnection
is (currently, let's see in the future with epoll()) based http://lists.gnu.org/archive/html/libmicrohttpd/2016-11/msg00014.html
About libcurl, there seems to be a way to ask libcurl to use poll and not select, however, we will need to rewrite all the curl code, as we right now use 'easy_curl', which only supports select(). We need to use'multi_curl':
MHD_USE_POLL Use poll() instead of select(). This allows sockets with descriptors >= FD_SETSIZE. This option currently only works in conjunction with MHD_USE_THREAD_PER_CONNECTION or MHD_USE_INTERNAL_SELECT (at this point). If you specify MHD_USE_POLL and the local platform does not support it, MHD_start_daemon will return NULL.
MHD_USE_EPOLL_LINUX_ONLY Use epoll() instead of poll() or select(). This allows sockets with descriptors >= FD_SETSIZE. This option is only available on Linux systems and does not work in conjunction with MHD_USE_THREAD_PER_CONNECTION (at this point). If you specify MHD_USE_EPOLL_LINUX_ONLY and the local platform does not support it, MHD_start_daemon will return NULL. Using epoll() instead of select() or poll() can in some situations result in significantly higher performance as the system call has fundamentally lower complexity (O(1) for epoll() vs. O(n) for select()/poll() where n is the number of open connections).
Due to the fact that MHD_USE_THREAD_PER_CONNECTION is mandatory with poll then the CLI option -reqPoolSize. Size of thread pool for incoming connections. Default value is 0, meaning no thread pool => will not have sense, I guess.
But it will do with epoll because it does not work in conjunction with MHD_USE_THREAD_PER_CONNECTION.
Relevant thread at libmicrohttpd mailing list: http://lists.gnu.org/archive/html/libmicrohttpd/2016-11/msg00020.html
About libcurl, there seems to be a way to ask libcurl to use poll and not select, however, we will need to rewrite all the curl code, as we right now use 'easy_curl', which only supports select(). We need to use'multi_curl':
Recent research on the topic shown that libcurl (in the way we use at CB) doesn't use select internally, as we have reproduce a case with 4000 outgoing notification connections (and 4000 is greater than 1024, the limit with select).
Thus, the only limit seems to be at MHD.
Implementation done at PR https://github.com/telefonicaid/fiware-orion/pull/2751. However, documentation at perf_tuning.md should be improved based in what we have learnt and implemented, so this issue will remain open yet a little bit.
Documentation completed in PR https://github.com/telefonicaid/fiware-orion/pull/2755
Assigning to @iariasleon for QA validation.
Doc still pending on this http://lists.gnu.org/archive/html/libmicrohttpd/2016-12/msg00022.html
Bug detected:
If try to start by service (service contextBroker start
) and the config used is -notificationMode threadpool:60000:1022
(1022 or higher), the CB does not start, showing in log this line:
time=2016-12-12T10:13:08.340Z | lvl=FATAL | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=rest.cpp[1547]:restStart | msg=Fatal Error (error starting REST interface)
If the CB is started by command (/usr/bin/contextBroker ....
) it is started successfully. No matter if -fg
is used or not in this case.
My bet: the thread limit is not being honoured when CB runs as a service. If I'm right, then a procedure to set thread limit for processes running as services would solve this problem.
Coment related to https://github.com/telefonicaid/fiware-orion/issues/2724#issuecomment-266398438
The contextBroker should be started by command (/usr/bin/contextBtoker ...
) instead of service (service contextBroker start), because the VMs are limited the number of threads (1024) in all users except in root
. See https://bugzilla.redhat.com/show_bug.cgi?id=919793 .
LGTM
Test: https://github.com/telefonicaid/fiware-orion/tree/master/test/loadTest/connections_stress_tests
Test configuration:
service: stablished_connections
servicePath: /test
CB endpoint: http://qa-orion-fe-01:1026
notification URL: http://qa-orion-fe-02:8090/notify
mongo host: qa-bigdata-sth-02
test duration: 60 minutes (3600 seconds)
version requests delay: 1 seconds
max subcription: 5000
noEstablished flag: False
noQueueSize flag: False
***************************************************************************************
* verify if the listener has a delay in the response (10 minutes recommended) *
* verify if these parameters are used in CB config: *
* -httpTimeout 600000 -notificationMode threadpool:60000:5000 *
***************************************************************************************
The database orion-stablished_connections has been erased
creating 5000 subscriptions...
5000 subscriptions have been created
Test init: 2016-12-12T16:22:15.851000Z
Reports each second:
counter version queue established
request size connections
----------------------------------------------------------
-------- 1 -------- OK -------- 0 -------- 1014 ----------
-------- 2 -------- OK -------- 0 -------- 1014 ----------
-------- 3 -------- OK -------- 0 -------- 1014 ----------
-------- 4 -------- OK -------- 0 -------- 1014 ----------
...
-------- 326 -------- OK -------- 0 -------- 1014 --------
-------- 327 -------- OK -------- 0 -------- 1014 --------
-------- 328 -------- OK -------- 0 -------- 12 ----------
-------- 329 -------- OK -------- 0 -------- 10 ----------
-------- 330 -------- OK -------- 0 -------- 10 ----------
-------- 331 -------- OK -------- 0 -------- 10 ----------
-------- 332 -------- OK -------- 0 -------- 10 ----------
-------- 333 -------- OK -------- 0 -------- 10 ----------
...
-------- 2087 -------- OK -------- 0 -------- 10 ---------
-------- 2088 -------- OK -------- 0 -------- 10 ---------
-------- 2089 -------- OK -------- 0 -------- 10 ---------
-------- 2090 -------- OK -------- 0 -------- 10 ---------
ALL (2090) "/version" requests responded correctly...Bye.
Test end: 2016-12-12T17:22:16.226000Z
We need to modify the broker (restInit in
rest.cpp
) so that MHD usespoll
and notselect
. Select only supports 1024 simultaneous connections and some configurations might need more than that. Also, there is a possibility we gain some performance using poll/epoll instead of select.A few unfruitful attempts have been made, but select is still used (check done with
strace contextBroker -fg
) and not poll ...