jrief / django-websocket-redis

Websockets for Django applications using Redis as message queue
http://django-websocket-redis.awesto.com/
MIT License
896 stars 222 forks source link

Connects and Immediately Closes #56

Closed grvhi closed 9 years ago

grvhi commented 9 years ago

I've used DWR a few times in production and setup has always been a breeze.... But this time, for some reason I'm stuck.

The short version is that in the browser console, I see the correct ws URL being connected to, followed by Connected! and immediately followed by Connection Closed. The JS script then retries, connects and is immediately disconnected. This loop continues ad infinitum.

I've tried everything. I've had help from the #uwsgi irc channel and everyone said my setup is correct. I've used - and am still using - the exact same setup in a few production sites. The only difference is that this site is hosted on AWS EC2, but not behind ELB.

Running locally under run server, all is well. I'm getting absolutely nothing in any django logs (I see the WS4Redis wsgi_server logs to django.request) and the uwsgi logs show the connection as being successful.

I'm sure I've been through everything. But I'm probably missing something obvious. If anyone can help put me straight, I'd be really grateful!!

jrief commented 9 years ago

Really hard to say. I have a running installation (http://djangular.aws.awesto.com/threeway_databinding/) on EC2 with DWR, which as I write these lines, works perfectly. Are you absolutely sure, that you don't have a proxy (even transparent) in your setup. Some proxies might not be aware of the additional protocol headers.

Do you run the ws against nginx or directly against uwsgi? If nginx, does it support websockets?

grvhi commented 9 years ago

Thanks @jrief - I'm running it against nginx 1.4.6 (checked that a number of times!). Maybe I'll try connecting directly to uwsgi. But is that suitable for a production environment?

What I still don't understand is why nothing's coming up in the logs. The only step I can see which doesn't generate a log entry of some kind is at L115 then L117 in wsgi_server.py. But then, if my setup generated some kind of exception, I shouldn't be getting that far and something should be coming up in the logs.

I've setup my logging in Django settings and tested that all DEBUG and above messages get written to the logs successfully, so I'm not exactly sure how to proceed at this point. Feel like I'm shooting in the dark!

jrief commented 9 years ago

Its really difficult to give hints for a system I didn't configure myself. The nice part about DWR is, that it can run in debug mode, ie. ./manage.py runserver. First try to run it in this environment, optionally attach a debugger, add breakpoints to the main dispatcher, watch the server variables, compare them with a local and working environment.

grvhi commented 9 years ago

I understand it's difficult for you to help me out, so thank you for taking the time. However, running on the production server using runserver, everything works flawlessly.

Given DWR works under runserver, does that point to NGINX (or some other proxy) getting in the way?

My NGINX conf file looks like this:

upstream uwsgi {
  ip_hash;
  server 127.0.0.1:40000;
}

server {
  listen                  80;
  server_name             XXXXXX;
  root /path/to/django_project;

  location / {
    include         /etc/nginx/uwsgi_params;
    uwsgi_pass      uwsgi;
  }

.......

  location /ws/ {
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";
    proxy_pass http://127.0.0.1:40001;
  }
}

My uwsgi setup file (run under supervisor) looks like this:

[uwsgi]
http-socket = 127.0.0.1:40001
chdir = /path/to/django_project
pythonpath = ../venv/lib/python2.7/site-packages
uid = ubuntu
master = true
env = DJANGO_SETTINGS_MODULE=core.settings
module = core.websocket_wsgi
workers = 2
http-websockets = true
gevent = 1000
virtualenv = /path/to/venv/
grvhi commented 9 years ago

@jrief - I've just discovered that the problem exists only if I use subscribe-group or subscribe-user.

Using only subscribe-broadcast, then everything works exactly as expected..... Any ideas?

jrief commented 9 years ago

I'am quite sure, that if you get your logger working, we'll immediately see what the problem is. Every possible exception is logged, just have a look at wsgi_server.py starting at line 100. Assumingly it only a configuration issue.

marcelkornblum commented 9 years ago

I'm having a similar (the same?) issue, although I've never used django-websocket-redis before, and I'm a bit stumped having spent the last couple of days trying things out.

My production box is Ubuntu 14.04, running nginx 1.4.6 (installed with aptitude) uWSGI 2.0.7 (installed with pip) on top of Python 2.7.6 (apt) in a virtualenv, as well as Redis 2.8.4 (apt), and Supervisor to control uWSGI. All of that is on an EC2 instance.

Using runserver works fine, both locally (in a vagrant instance that mirrors production's setup) and on the EC2 instance itself. When I try using the nginx <--> uwsgi <--> django / dwr <--> redis setup though, the HTTP part works fine but the WS part doesn't.

I was previously getting the same "opening/closing" behaviour described by @Erve1879 but after reconfiguration I now see an extremely long-running request that eventually times out on the client. The only logs I get from uWSGI show lines like: [pid: 30159|app: 0|req: 1/1] 0.0.0.0 () {50 vars in 1167 bytes} [Thu Oct 23 10:08:30 2014] GET /ws/foobar?subscribe-user => generated 46 bytes in 1146 msecs (HTTP/1.1 500) 0 headers in 0 bytes (3 switches on core 999) ...but I can't find a way to see what the 500 is caused by. The previous error (connect/disconnect in the console) also showed similar 500 errors.

My nginx conf looks like:

upstream my_app_server {
  ip_hash;
  server unix:/tmp/uwsgi.sock fail_timeout=0;
}
server {
    listen 80 default_server;
    listen [::]:80 default_server ipv6only=on;
    server_name _;
    charset utf-8;
    client_max_body_size 4G;
    access_log /path/to/nginx-access.log;
    error_log /path/to/nginx-error.log;
    location /static/ {
        alias   /path/to/my_app/static/;
    }
    location /media/ {
        alias   /path/to/my_app/media/;
    }
    location / {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_redirect off;

        include /etc/nginx/uwsgi_params;
        uwsgi_pass my_app_server;
    }
    location /ws/ {
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_pass http://unix:/tmp/uwsgi-ws.sock;
    }
}

My uwsgi.ini file looks like:

[base]
chdir = /path/to/my_app
module = my_app.wsgi:application
home = /path/to/venv
env = DJANGO_SETTINGS_MODULE=my_app.settings
master = true
uid = www-data
gid = www-data
vacuum = true
buffer-size = 65535
disable-logging = false
processes = 3
harakiri = 20
max-requests = 5000
pythonpath = /path/to/venv/lib/python2.7/site-packages
virtualenv = /path/to/venv

[httpserver]
ini = :base
pidfile = /tmp/my_app-uwsgi-nginx-http.pid
socket = /tmp/uwsgi.sock
chmod-socket = 666
processes = 2
harakiri = 20
max-requests = 5000

[socketserver]
ini = :base
module = my_app.wsgi_websocket:application
http-websockets = true
gevent = 1000
workers = 2
http-socket = /tmp/uwsgi-ws.sock

I then have two wsgi.py files as per the instructions in the docs; they're literal copy-paste jobs from that page and my supervisor script points each uwsgi instance at the respective py file, like so:

[program:uwsgi-ws]
command=/path/to/venv/bin/uwsgi --ini /path/to/uwsgi-ws.ini # all this file does is implement the :socketserver part of the main ini file
directory=/path/to/my_app
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/path/to/uwsgi-ws_supervisor.log
stderr_logfile=/path/to/uwsgi-ws_supervisor-error.log
environment=LANG=en_US.UTF-8,LC_ALL=en_US.UTF-8

I'm new to uWSGI (I just implemented it in order to run this), and I think it must be some sort of config snafu, but I'm out of ideas and starting to think of writing a little python script to replace uWSGIWebsocketServer -- so I could at least figure out how to debug this.

But hopefully you might be able to help!

Thanks for getting this far...

jrief commented 9 years ago

@marcelkornblum I just compared your configuration with mine, and they look almost the same.

Maybe there is a problem with uWSGI itself. I use version uWSGI==2.0.3 inside a virtualenv.

jrief commented 9 years ago

this is my virtual env for the apps which matter:

django-redis-sessions==0.4.0
django-websocket-redis.git (from the reporistory)
ecdsa==0.11
gevent==1.0
greenlet==0.4.2
redis==2.9.1
six==1.6.1
uWSGI==2.0.3
wsaccel==0.6.2
wsgiref==0.1.2

and it drives this site: http://djangular.aws.awesto.com/threeway_databinding/

marcelkornblum commented 9 years ago

I just ran pip install -I uWSGI==2.0.3 and -- briefly -- it seemed to do the trick. I saw in a session that I was getting handshakes, but I tried refreshing before doing anything else and now the client is showing a 504 timeout and the uWSGI server a 500 again (it briefly logged a couple of 101 lines).

Here is a snippet of the log changing from good to bad

No handlers could be found for logger "django.request"
[pid: 14090|app: 0|req: 4/4] 0.0.0.0 () {188 vars in 3693 bytes} [Thu Oct 23 15:15:46 2014] GET /ws/foobar?subscribe-user => generated 25 bytes in 902 msecs (HTTP/1.1 101) 4 headers in 205 bytes (3 switches on core 999)
[pid: 14090|app: 0|req: 5/5] 0.0.0.0 () {50 vars in 1167 bytes} [Thu Oct 23 15:16:35 2014] GET /ws/foobar?subscribe-user => generated 46 bytes in 0 msecs (HTTP/1.1 500) 0 headers in 0 bytes (3 switches on core 999)

Any idea what the No handlers could be found for logger "django.request" could mean? The couple of successfult 101 responses that got logged each seemed to be preceded by that, but the 500 ones aren't...

(thanks for your quick answer by the way, my account on GH was misconfigured and I didn't get notified)

marcelkornblum commented 9 years ago

Actually on second thought perhaps that log line isn't hugely revealing. Here's a chunk from the log after restarting supervisor. uWSGI goes straight into 500 mode, with a copy of that line just before....

uWSGI Log

marcelkornblum commented 9 years ago

Looking back a little further in the logs to where the 101 responses were entered is really confusing: Older logs. It seems a 101 was returned only when the server was still just restarting, and some processes were complaining about things not being ready yet.

I had the client page open the whole time, so the logs are a bit messy with requests coming in and all.

Anyway, here's hoping you can shed any light on it. Thanks again.

grvhi commented 9 years ago

@marcelkornblum - thanks for the extra details. Just to clarify, which facilities are you using? If you only use subscribe-broadcast, does everything work?

I've tried downgrading to uWsgi==2.0.3 (none of the other packages just yet), but I am still only able to get the system to work with subscribe-broadcast. subscribe-group and subscribe-user both cause the behaviour detailed above.

marcelkornblum commented 9 years ago

I am actually after subscribe-user, but I tried out both that and subscribe--broadcast to see if I could at least get that working: no such luck :/ On 23 Oct 2014 15:58, "George Irwin" notifications@github.com wrote:

@marcelkornblum https://github.com/marcelkornblum - thanks for the extra details. Just to clarify, which facilities are you using? If you only use subscribe-broadcast, does everything work?

I've tried downgrading to uWsgi==2.0.3 (none of the other packages just yet), but I am still only able to get the system to work with subscribe-broadcast. subscribe-group and subscribe-user both cause the behaviour detailed above.

— Reply to this email directly or view it on GitHub https://github.com/jrief/django-websocket-redis/issues/56#issuecomment-60251766 .

grvhi commented 9 years ago

Interesting... The plot thickens! I'm not sure if I'll get some time to try out downgrading to the same package versions as jrief anytime soon - for now subscribe-broadcast is doing what I need.

marcelkornblum commented 9 years ago

In case it helps, several of my packages are newer than @jrief 's:

Django==1.7
Fabric==1.10.0
Markdown==2.5
PyJWT==0.2.1
Werkzeug==0.9.6
argparse==1.2.1
boto==2.32.1
coverage==3.7.1
croniter==0.3.5
django-contrib-comments==1.5
django-cors-headers==0.13
django-debug-toolbar==1.2.1
django-extensions==1.4.0
django-filter==0.7
django-guardian==1.2.4
django-model-utils==2.2
django-nose==1.2
django-notifications-hq==0.6.2
django-redis-sessions==0.4.0
django-storages==1.1.8
django-websocket-redis==0.4.2
djangorestframework==2.4.3
djangorestframework-jwt==1.0.1
drf-extensions==0.2.6
ecdsa==0.11
factory-boy==2.4.1
gevent==1.0.1
gnureadline==6.3.3
greenlet==0.4.5
gunicorn==19.1.1
ipython==2.2.0
nose==1.3.4
nose-exclude==0.2.0
paramiko==1.15.1
psycopg2==2.5.4
pycrypto==2.6.1
python-dateutil==2.2
pytz==2014.7
redis==2.10.3
simplejson==3.6.4
six==1.8.0
sqlparse==0.1.12
uWSGI==2.0.7
wsaccel==0.6.2
wsgiref==0.1.2

I'm now having trouble getting my vagrant machine to go as far as the EC2 one using the production setup, just in order to debug easily; I get a 404 from the /ws/ endpoint and nginx doesn't seem to even pass it to the uwsgi socket.

I'll carry on playing around with the EC2 box and update if I figure anything out....

jrief commented 9 years ago

Yes, I need some more time for investigation here. In the meantime please proceed with subscribe-broadcast. This is the only setting, where the session-id from the client is not evaluated. For subscribe-user, subscribe-session and subscribe-group, WDR uses the session-id sent by the client to find out the user. My assumption is, that this for some reason doesn't work with uwsgi.

marcelkornblum commented 9 years ago

Unfortunately, I can't even get subscribe-broadcast to work :(

I was wondering if my settings could have been affecting it, so I stopped the websocket wsgi file from using the settings, to no avail.

I had initially been using redis on a socket but switched to http at the same time. Django could put keys into redis OK without any specific config info in the settings file, so I don't think that's the reason after all.

Is there a way of enabling extra debug messages or something, to try and figure this out?

I've got some other components to work on but before long I'll have to make a call about trying node it something to do that part of the job :(

Thanks again @jrief

weelng commented 9 years ago

@marcelkornblum I'm encountering the same problem as you. Are you able to sort it out? It seems like the logger's handle isn't working when I'm deploying it in production mode.

marcelkornblum commented 9 years ago

I couldn't get it to work; I ended up making a simple NodeJS app to handle the socket connection :/

Marcel


Portfolio: http://ma.rcel.biz/workivedone Blog: http://marcelkornblum.com

On 19 November 2014 15:28, Wee Liat notifications@github.com wrote:

@marcelkornblum https://github.com/marcelkornblum I'm encountering the same problem as you. Are you able to sort it out? It seems like the logger's handle isn't working when I'm deploying it in production mode.

— Reply to this email directly or view it on GitHub https://github.com/jrief/django-websocket-redis/issues/56#issuecomment-63656966 .

weelng commented 9 years ago

Oh ok. How do you manage to get subscribe-sessions going though?

marcelkornblum commented 9 years ago

I didn't need it for my use case; I just made a channel per user and subscribed each user to their own channel. A bit brute force :/

Marcel


Portfolio: http://ma.rcel.biz/workivedone Blog: http://marcelkornblum.com

On 19 November 2014 16:42, Wee Liat notifications@github.com wrote:

Oh ok. How do you manage to get subscribe-sessions going though?

— Reply to this email directly or view it on GitHub https://github.com/jrief/django-websocket-redis/issues/56#issuecomment-63670632 .

weelng commented 9 years ago

Alright. Looks like I have to wait for the fix. I was trying to log errors down in production but apparently it didn't trigger the handler.

weelng commented 9 years ago

Hi @jrief I've tried troubleshooting in production mode all day and night to get some clues on what happened, configured LOGGER to handle exceptions by writing to both files & email (I know they are blocking calls but I need to get some clues) but they failed badly. I can't get anything out of it. Can you point me to a direction where I can help to troubleshoot this issue?

jrief commented 9 years ago

@weelng what do you mean with “but they failed badly”? Two thing which came to my mind: if subscribe-broadcast works but subscribe-user and/or subscribe-group does not, are you sure that the cookie containing the session-id is passed through the websocket? If neither works, may it be that there any (transparent?) HTTP proxy. Such a proxy could be badly implemented and be inserted between your browser and the server without your knowledge.

weelng commented 9 years ago

@jrief I could not deploy at all. I have followed exactly what was in the docs, settings that's discussed here, the Django instance is running but not the ws. It keeps throwing a 500 error. I can't even get subscribe-broadcast working.

weelng commented 9 years ago

Hi @marcelkornblum can I check if you're using the safety consideration by declaring get_allowed_channels?

@jrief I realized that if I have the following:

def get_allowed_channels(request, channels):
    if request.user:
        if not request.user.is_authenticated():
            raise PermissionDenied('Unauthenticated access!')

    return set(channels).intersection(['subscribe-session'])

I'll face with error 500 despite making sure my user has already logged in and try connecting to the web socket. There's no such problem when it's in DEBUG=True mode.

jrief commented 9 years ago

@weelng This seems, as if you don't store your session data in Redis. Did you follow these instructions? http://django-websocket-redis.readthedocs.org/en/latest/installation.html#replace-memcached-with-redis

weelng commented 9 years ago

@jrief I'm so sorry I missed that out. Thank you. I got it working fine now.

jrief commented 9 years ago

I added a reference implementation to avoid these kinds of errors in the future: http://websocket.aws.awesto.com/

Divargar-mahnaz commented 1 year ago

I had similar problem .

My websocket connect and disconnect immediately ,

Although Redis was installed,

The problem was with the Redis version.

Your Redis version must be greater than five