Performace issues while doing a stress test

Edke commented 9 years ago

Hello Jacob.

I'm trying to implement your implementation of Websockets. Before going further with implementation in current project, we are testing it in stress test to see it's performance and resources needed for i.e. 1000 concurrent connections.

I created a task that simulates concurrent clients, master process forks clients that are connecting to server. Using Django's development server with runserver, stress worked quite well, seems that dev server within runserver uses treads for every new connection.

Moving to production enviro I tried same with nginx and uwsgi. Tried different scenarios, but nginx with uwsgi didn't get me to even 100 stable concurrent connections. My current config follows:

We need that every single client connects to it's unique channel to get only events indented for it, therefore every forked child connects to unique channel (UUID string).

We are not able to achieve even 50 concurrent clients with 2 workers, most of them get Error occured: Handshake status 502 when trying to connect.

Raising workers of ws uWSGI to 20 can't handle 150 clients.

nginx.conf:

worker_processes 4;

events {
    worker_connections 768;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    server {
        listen 8012 default_server;
        charset utf-8;
        client_max_body_size 20M;
        sendfile on;
        keepalive_timeout 0;
        large_client_header_buffers 8 32k;

        location /ws/ {
                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "upgrade";
                proxy_pass http://unix:/run/ws.socket;
                proxy_buffers 8 32k;
                proxy_buffer_size 64k;
        }

        location / {
            include uwsgi_params; 
            uwsgi_pass unix:/run/django.socket;
        }
    }
}

uwsgi.ini for django:

[uwsgi]
plugins = python2
chown-socket = http:http
uid = kraken
gid = users
cheaper = 1
processes = 2

socket = /run/django.socket
buffer-size = 32768
master = True
base-dir = /home/kraken/apps/proj
chdir = %(base-dir)
wsgi-file = core/wsgi_django.py
virtualenv =  %(base-dir)/.env

uwsgi.ini for ws:

[uwsgi]
plugins = python2
chown-socket = http:http
uid = kraken
gid = users
workers = 2

http-socket = /run/ws.socket
gevent = 1000
http-websockets = True

master = True
base-dir = /home/kraken/apps/proj
chdir = %(base-dir)
wsgi-file = core/wsgi_ws.py
virtualenv =  %(base-dir)/.env

Any guidance how to handle 1000 concurrent clients or even more ?

htayanloo commented 9 years ago

A multi-threaded or multi-process based server cannot scale appropriately for WebSockets because it is designed to open a connection, handle a request as quickly as possible and then close the connection. An asynchronous server such as Tornado or Green Unicorn monkey patched with gevent is necessary for any practical WebSockets server-side implementation.

jrief commented 9 years ago

@htayanloo uwsgi is a perfect application runner which can handle websockets. Please see Roberto's notes about this. @Edke How about doing the stress test with uwsgi but without nginx. Then at least we can see who of those services is the culprit.

ethanfrey commented 8 years ago

We had some similar stress tests before and locally on osx had some issues at around 600 connections per uwsgi process on osx, but I think it was due to file handle things.

One production, we have 9 uwsgi processes running behind nginx on a 4 cpu machine and over 2000 active conncections at any moment, with minimal cpu usage. What I did notice is when we stress tested >100 new connections/second, there were issues. So, for our stress tests, we connected ~2000 connections in batches of 100 a second over 20 seconds, while sending 1 message / second over all open connection. This worked and is a reasonable simulation.

ethanfrey commented 8 years ago

I made three pull requests during development and deployment: #93 #94 and #95. #93 was the only performance related issue for cleaning up memory leaks, and file handle leaks.

AgDude commented 7 years ago

Thanks all for the info on stress testing. @ethanfrey and @htayanloo I suspect your issues with lots of quick connections are due to the uwsgi listen queue which is set at 100 by default.

jrief / django-websocket-redis

Performace issues while doing a stress test #110