ratchetphp / Ratchet

Asynchronous WebSocket server
http://socketo.me
MIT License
6.29k stars 748 forks source link

Limited to 1024 concurrent connections, and looking for suggestions. #300

Open rmmoul opened 9 years ago

rmmoul commented 9 years ago

I started a project using ratchet, and wanted to test the number of connections that could be handled at one time on our server (Digital Ocean Ubuntu 14.04, 2 cores, 4GB ram running php 5.6.7 and apache2 2.4.7).

I followed some of the suggestions here on the deploy page http://socketo.me/docs/deploy to help increase the number of connections that could be handled, and seemed to get the ulimit and such to up the number of open files to 10,000.

I started running tests today using thor (https://github.com/observing/thor):

thor --amount 10000 ws://example.com:2600 -C 1000 -W 2 -M 100 

I got a php error when the number of connections exceeded 1024:

PHP Warning:  stream_select(): You MUST recompile PHP with a larger value of FD_SETSIZE.
It is set to 1024, but you have descriptors numbered at least as high as 1123.
 --enable-fd-setsize=2048 is recommended, but you may want to set it
to equal the maximum number of open files supported by your system,
in order to avoid seeing this error again at a later date. in
/var/www/example.com/server/vendor/react/event-loop/StreamSelectLoop.php on line 255

I was actually using php 5.5.9 at the time, so I followed some old instructions from http://ubuntuforums.org/archive/index.php/t-2130554.html and increased the FD_SETSIZE value to 10000 in the following two files and then downloaded and compiled php 5.6.7.

/usr/include/linux/posix_types.h
/usr/include/x86_64-linux-gnu/bits/typesizes.h

That coupled with using this command to run the server through supervisor:

bash -c "ulimit -n 10000 && php /var/www/hyvly.com/server/server.php"

Seems to have allowed the number of connections to go beyond 1024, but now it causes a buffer overflow within php, showing this error in the log file before restarting the process:

*** buffer overflow detected ***: php terminated

I'm curious how other users are getting beyond 1024 concurrent connections, whether some of you have never hit this limit at all (could you share your environment details), or made certain changes to get beyond it (could you share what changes you've made)?

jupitern commented 5 years ago

@i3bitcoin how many connections did you achieve?

i3bitcoin commented 5 years ago

@jupitern

More than 3k connections right now. It's the only solution worked for me.

I believe it's limited only with rlimit.

josephmiller2000 commented 5 years ago

@i3bitcoin is there anyway i can contact you in personal about setting HHVM with ratchet chat? Im stuck with same 1024 connection limit.

inri13666 commented 5 years ago

@josephmiller2000 , possible this post may help you https://github.com/ratchetphp/Ratchet/issues/328#issuecomment-484266082

The main quick solution is to use any other Loop Event library instead of default React\EventLoop\StreamSelectLoop

josephmiller2000 commented 5 years ago

@inri13666 Well, im using "even.so" event and tested with this method

https://github.com/ratchetphp/Ratchet/issues/300#issuecomment-318351931

Ev, is not detected by php, so right now using "event.so" instead of StreamSelectLoop

Increased all server side limits and php-fpm limits, still can't achieve more than 1024 at my peak time.

Users are in close_wait(socket) stage when they are connected to the chat.

So i planned to move to >> HHVM instead of basic php.

WyriHaximus commented 5 years ago

Easiest solution to this is to install ext-uv and make sure you're running the latest react/event-loop which has support for it. (And use the Factory::create() method to get your event loop of course.)

josephmiller2000 commented 5 years ago

@WyriHaximus thanks for the comment, i can successfully can install "event", but cannot install "ext-uv".

End up getting this error,

Snap_Shot_00769

inri13666 commented 5 years ago

Ok, could you please share the result

php -r "require_once 'vendor/autoload.php'; var_dump(\React\EventLoop\Factory::create());"

for my configuration it's

D:\_dev\sites\private\event-loop>php -r "require_once 'vendor/autoload.php'; var_dump(\React\EventLoop\Factory::create());"
Command line code:1:
class React\EventLoop\ExtEventLoop#3 (14) {
  private $eventBase =>
  class EventBase#4 (0) {
  }
  private $futureTickQueue =>
  class React\EventLoop\Tick\FutureTickQueue#5 (1) {
    private $queue =>
    class SplQueue#6 (2) {
      private $flags =>
      int(4)
      private $dllist =>
      array(0) {
        ...
      }
    }
  }
...
josephmiller2000 commented 5 years ago

Here you go @inri13666

root@vps652855:# php -r "require_once 'vendor/autoload.php'; var_dump(\React\EventLoop\Factory::create());"
object(React\EventLoop\ExtEventLoop)#3 (11) {
  ["eventBase":"React\EventLoop\ExtEventLoop":private]=>
  object(EventBase)#2 (0) {
  }
  ["nextTickQueue":"React\EventLoop\ExtEventLoop":private]=>
  object(React\EventLoop\Tick\NextTickQueue)#4 (2) {
    ["eventLoop":"React\EventLoop\Tick\NextTickQueue":private]=>
    *RECURSION*
    ["queue":"React\EventLoop\Tick\NextTickQueue":private]=>
    object(SplQueue)#5 (2) {
      ["flags":"SplDoublyLinkedList":private]=>
      int(4)
      ["dllist":"SplDoublyLinkedList":private]=>
      array(0) {
      }
    }
  }
  ["futureTickQueue":"React\EventLoop\ExtEventLoop":private]=>
  object(React\EventLoop\Tick\FutureTickQueue)#6 (2) {
    ["eventLoop":"React\EventLoop\Tick\FutureTickQueue":private]=>
    *RECURSION*
    ["queue":"React\EventLoop\Tick\FutureTickQueue":private]=>
    object(SplQueue)#7 (2) {
      ["flags":"SplDoublyLinkedList":private]=>
      int(4)
      ["dllist":"SplDoublyLinkedList":private]=>
      array(0) {
      }
    }
  }
  ["timerCallback":"React\EventLoop\ExtEventLoop":private]=>
  object(Closure)#9 (2) {
    ["this"]=>
    *RECURSION*
    ["parameter"]=>
    array(3) {
      ["$_"]=>
      string(10) "<required>"
      ["$__"]=>
      string(10) "<required>"
      ["$timer"]=>
      string(10) "<required>"
    }
  }
  ["timerEvents":"React\EventLoop\ExtEventLoop":private]=>
  object(SplObjectStorage)#8 (1) {
    ["storage":"SplObjectStorage":private]=>
    array(0) {
    }
  }
  ["streamCallback":"React\EventLoop\ExtEventLoop":private]=>
  object(Closure)#10 (2) {
    ["this"]=>
    *RECURSION*
    ["parameter"]=>
    array(2) {
      ["$stream"]=>
      string(10) "<required>"
      ["$flags"]=>
      string(10) "<required>"
    }
  }
  ["streamEvents":"React\EventLoop\ExtEventLoop":private]=>
  array(0) {
  }
  ["streamFlags":"React\EventLoop\ExtEventLoop":private]=>
  array(0) {
  }
  ["readListeners":"React\EventLoop\ExtEventLoop":private]=>
  array(0) {
  }
  ["writeListeners":"React\EventLoop\ExtEventLoop":private]=>
  array(0) {
  }
  ["running":"React\EventLoop\ExtEventLoop":private]=>
  NULL
}
inri13666 commented 5 years ago

@josephmiller2000, I'm using socket server behind NGinX

nginx.conf
worker_processes auto;
worker_rlimit_nofile 40000;  # Important
events {
    worker_connections  40000;  # Important
    multi_accept        on;  # Important
    use                 epoll;  # Important
}
default.conf
server {
    server_name _;

    listen 8000 default_server;
    listen [::]:8000 default_server;

    root        /home/site/wwwroot/web;
    error_log   /home/LogFiles/nginx-error.log;
    access_log  /home/LogFiles/nginx-access.log;

    location ~ ^/ws(/|$)$ {
        proxy_pass          http://127.0.0.1:8080;
        proxy_http_version  1.1;
        proxy_set_header    Upgrade $http_upgrade;
        proxy_set_header    Connection "Upgrade";
        proxy_buffer_size       128k;
        proxy_buffers           4 256k;
        proxy_busy_buffers_size 256k;
    }
WyriHaximus commented 5 years ago

@WyriHaximus thanks for the comment, i can successfully can install "event", but cannot install "ext-uv".

End up getting this error,

Snap_Shot_00769

Did you check config.log? To be honest I never had issues compiled ext-uv except for the occasional missing libuvdev (or what ever the name is on your distro).

josephmiller2000 commented 5 years ago

Anyway figured out how to install ext-uv and all got up and working.

This is the maximum, connection i can get whatever events i use. Increased server limits and all done on my side. Even the script is using Zeromqnow.

Snap_Shot_00077

jupitern commented 5 years ago

with a cent os with 2gb ram, uv installed and a node socket client sending connections from other machine at my company we are reaching 20k connections. we just don't get more because all ram is in use.

node client => https://github.com/jupitern/node-socket-client

shmeeps commented 3 years ago

Just got hit with this and was able to eventually work around it. Wanted to share what all I went through in case it helps someone else down the line, because it took me two frustrating days with angry clients to resolve completely. For reference, we're running Ratchet with an Apache 2.4 reverse proxy on PHP 7.0, all running on Ubuntu 16.04. The Ratchet script is kept running by a supervisor task, ensuring that it restarts if it ever crashes. The Ratchet script is pretty straight forward; it interacts with an API on connection or when receiving certain messages, and contains a timer to hit the API for some data to send to specific clients (maintained by a user -> client map). Ratchet was maxing out at around 500 connections when we started.

First thing we noticed was Apache redlining both cores of the server. Ideally we'd move to a better server software like nginx, but our app currently prevents that. We also have to use a reverse proxy for SSL. We tried to use the underlying React library to run a WSS server directly without needing Apache/nginx, but weren't able to get it working correctly.

Bumping the server up to 4 cores gave enough resources to run Apache comfortably. From there we noticed that we'd still get 500 errors periodically, and some investigation into Apache revealed that it was tuned poorly and would cap out at a few hundred concurrent connections. Since the websockets count as a connection, these would quickly eat up available threads and prevent Apache from serving other traffic (other PHP scripts and static content). We were already using mpm_event, and updated our config to the following:

<IfModule mpm_event_module>
    StartServers 10
    MinSpareThreads 25
    MaxSpareThreads 750
    ThreadLimit 1000
    ThreadsPerChild 750
    # MaxRequestWorkers aka MaxClients => ServerLimit * ThreadsPerChild
    MaxRequestWorkers 15000
    MaxConnectionsPerChild 0
    ServerLimit 20
    ThreadStackSize 524288

Stress testing the server after this showed we could comfortably maintain thousands of requests a minute without any issue, which is well over what we needed to serve.

From there, we noticed that while Apache was running fine, the Ratchet script was now redlining with only a few hundred connections. Various searching led to the well documented StreamSelectLoop issue. We ruled out LibEvent due to using PHP 7.0, and weren't able to get LibUv to install without errors, so settled on LibEv with the following:

sudo pecl install ev
echo 'extension=ev.so' > /etc/php/7.0/mods-available/ev.ini
sudo phpenmod ev
sudo service php7.0-fpm restart # Not needed as the Ratch script is CLI, but better to see if this causes FPM issues now than later

Running a second instance of the Ratchet script that would initialize and then execute die(get_class($server->loop)); verified that the server was no longer running with a StreamSelectLoop and instead using a ExtEvLoop. We restarted the Ratchet script and let clients begin to auto-reconnect (our client side script will attempt to reconnect in increasing time per attempt), figuring we could watch the script as they reconnected for any performance issues. Everything ran fine, with the Ratchet script taking no more than 25% of a core until about 15 minutes later when it began to redline again. At this point, attempting to open a new connection would hang for a few minutes before failing.

We attempted to connect directly to the Ratchet script from the server itself (ie, bypassing Apache) to see if we could connect.

curl \
    -o - \ 
    --http1.1 \
    --include \
    --no-buffer \
    --header "Connection: Upgrade" \
    --header "Upgrade: websocket" \
    --header "Host: localhost:8080" \
    --header "Origin: http://localhost:8080" \
    --header "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
    --header "Sec-WebSocket-Version: 13" \
    http://localhost:8080/

This would also hang and then fail. When we restarted the Ratchet script, we could use the above to connect immediately, but once it started redlining we could not. This indicated that Apache was fine, and the limit was on the Ratchet script.

We updated the script to output the number of connected clients on tick and restarted, which would get to 1017 and no higher. This was conspicuously close to 1024, so we assumed it was some form of system limit. We checked the overall system limits using ulimit -a and saw no issues. However, we checked the actual process limits with the following

ps aux | grep RatchetScript.php # Record PID from this command
cat /proc/<PID>/limits

and saw that it was soft limited to 1024 soft / 4096 hard max open files. Updating this with

prlimit --pid <PID> --nofile=500000:5000000

and checking the log verified that once these limits were raised, we were able to handle an additional several thousand connections, after which we could still connect via a browser to our app or via the cURL request above with no issue.

We figured this was a user-limit issue (the script does not run as the webuser) and updated /etc/security/limits.conf with the user running the script and restarted, but saw that the limits were reset. We also attempted to run sudo su - ratchetuser -c 'ulimits -a' to see if that neede to be updated for the user, but those also appeared fine. After some further digging, we came across an article saying the 1024 / 4096 limit is enforced by supervisor, after which we updated /etc/supervisor/supervisord.conf with the following:

[supervisord]
....
minfds=500000

Restarting verified that the limits were maintained on the Ratchet script. The Ratchet script is now handling ~2,500 connections and using about 10% of one core, with small spikes here and there (mainly on client connection, as we have to decrypt connection data).

I imagine that the redlining occurs when Ratchet basically deadlocks waiting on a file handle that can't be created, but I haven't been able to verify this yet. It would explain the vast performance decrease once those connections are able to be properly created and maintained.

abbaasi69 commented 2 years ago

I had an experience which may help somebody. My server was stoping responding after one hour when the number of concurrent socket connections reached about 700. After doing all of the possible solutions, I realized that I had a ProxyPass in apache which redirects port 443 (SSL) to 8080 (my socket port). Finally, I increased the ServerLimit in my Apache prefork configuration from 700 to 1700 and the problem was solved temporarily. This shows that if you use ProxyPass of Apache (or another webserver) the Apache will become busy as it is between the client and WebSocket server.

mr-older commented 1 year ago

I had that problem with ReactPHP, the core is deeper. Its nature rests in php methods of servicing socket events. Code was rewritten in cpp using epoll instead of select. stackoverflow