jijo-paulose / node-xmpp-bosh

Automatically exported from code.google.com/p/node-xmpp-bosh
0 stars 0 forks source link

Problem with Nginx and node-xmpp-bosh #41

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Install Nginx, Node and Node-Xmpp-bosh
2. Configure nginx with proxy mod :
    location /http-bind {
        proxy_buffering off;    
        tcp_nodelay on;
                proxy_pass      http://127.0.0.1:5280/http-bind;
        }

3. Launch node-xmpp-bosh

What is the expected output? What do you see instead?

it crashes regularly.
On Nginx, every minute, we have a 504 error and when we do not have error 504 
is a 502 and node-xmpp-bosh is crashed.
Node-xmpp-bosh crashed  in less than 10 minutes with 1 or 2 users connected

There is no error in the node-xmpp-bosh logs after the crash even activating 
the debug mod.

When we don't have errors, our application works perfectly
When we have a 504 error the application seems to still work.

What version of the product are you using? On what operating system?
Debian Squeeze 64bits
Node-xmpp-bosh v0.5.6
Node 0.6.9
Nginx 1.0.11 with passenger ( http://www.modrails.com/ )

Please provide any additional information below.

Our chat is developed with RAILS/Ajax.
It connects to HTTP-Bind

With Apache2 we have no problem on the same server (Same version of node and 
node-xmpp-bosh )

Original issue reported on code.google.com by Caez0...@gmail.com on 3 Feb 2012 at 10:21

GoogleCodeExporter commented 9 years ago
@Caez0683, Do you have a crash report? There should be one printed to stderr 
(standard error) if NXB crashed.

Also, have you configured the URL in the NXB config file to not require a 
trailing slash? .../http-bind/

Original comment by dhruvb...@gmail.com on 3 Feb 2012 at 12:30

GoogleCodeExporter commented 9 years ago
For the /, no, because the URL is called with a / :
- [01/Feb/2012:15:22:30 +0100] "POST /http-bind/ HTTP/1.1" 200 856 
"http://www.XXXXXXXX.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0) 
Gecko/20100101
 Firefox/9.0"

I haven't any error, in stderr or in the screen.

If i called the js with
/usr/local/lib/bosh/run-server.js "$@" >> /var/log/bosh/bosh.log 2>> 
/var/log/bosh/bosh.err &

And in my error log,i have only one error 

node.js:201
        throw e; // process.nextTick error, or 'error' event on first tick
              ^
Error: Error: listen EADDRINUSE
    at emit_error (/usr/local/lib/node_modules/node-xmpp-bosh/src/websocket_draft10.js:219:10)
    at BoshEventPipe.emit (/usr/local/lib/node_modules/node-xmpp-bosh/node_modules/eventpipe/eventpipe.js:63:25)
    at Server.http_error_handler (/usr/local/lib/node_modules/node-xmpp-bosh/src/bosh.js:208:18)
    at Server.emit (events.js:67:17)
    at Array.0 (net.js:743:12)
    at EventEmitter._tickCallback (node.js:192:40)

When the js is launched on a screen, I don't have error when node-xmpp is 
stopped so i don't understand where is the problem

With the same conf and apache, all is working.

Original comment by Caez0...@gmail.com on 3 Feb 2012 at 1:38

GoogleCodeExporter commented 9 years ago
The error you have shown indicates that you already have an instance of the 
bosh server running on the same port and you are trying to launch another 
instance.
Are you sure NXB isn't already running when you try to launch it?

I know that there are successful installations of NXB using nginx, so this 
should be a configuration issue.

Original comment by dhruvb...@gmail.com on 3 Feb 2012 at 3:04

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Yes, I'm sur, i have put an example of a crash.
I have a cron which check my process every minutes. If node is not on the list, 
the script launch the process
This error occur because XMPP was launched in a screen and we launch it again 
via the bosh-server command (/usr/local/lib/bosh/run-server.js "$@" >> 
/var/log/bosh/bosh.log 2>> /var/log/bosh/bosh.err & ) 

When we have installed NXP, we have some errors, so everything was logged on 
the logs and we have corrected the bugs.

I don't understand why NXP crashed when Nginx do an http-bind.
If i'm changing the proxy timeout values of Nginx, NXP is crashed in 2 - 3 
minutes.
If I put the default value, 8 - 10 minutes. 

 When i've installed nginx, we have only put 

location /http-bind {
                proxy_pass      http://127.0.0.1:5280/http-bind;
        }

on our configuration.

In some forums, they advised, for xmpp to add options, proxy_buffering and 
tcp_nodelay , so i add them.

My nodes log  when a crash occurs :

t" to="www.xxxxxxxxxxxx.com" id="516-13763309" xmlns="jabber:client" 
qid="4000170851"/>
DEBUG Wed Feb 01 2012 17:33:24 GMT+0100 (CET) XMPP PROXY 
CONNECTOR::stanza:673d9c3d-774b-4fc1-b049-acd89d867c11::<iq type="result" 
to="www.xxxxxxxxx.com" id="516-137
63309" xmlns="jabber:client" qid="4000170851"/>
DEBUG Wed Feb 01 2012 17:33:24 GMT+0100 (CET) XMPP PROXY::sending:<iq 
type="result" to="www.xxxxxxxxxx.com" id="516-13763309" xmlns="jabber:client" 
qid="4000170851"/>
DEBUG Wed Feb 01 2012 17:33:24 GMT+0100 (CET) 
SESSION::950474e4-7c20-41e5-84cd-2a95a7f3c2fe::send_pending_responses::state.pen
ding.length: 0
DEBUG Wed Feb 01 2012 17:33:24 GMT+0100 (CET) 
SESSION::950474e4-7c20-41e5-84cd-2a95a7f3c2fe::_stitch_new_response::len::1::nex
t_stream::0
DEBUG Wed Feb 01 2012 17:34:24 GMT+0100 (CET) 
SESSION::950474e4-7c20-41e5-84cd-2a95a7f3c2fe::send_no_requeue, ro valid: true
DEBUG Wed Feb 01 2012 17:34:24 GMT+0100 (CET) 
SESSION::950474e4-7c20-41e5-84cd-2a95a7f3c2fe::send_no_requeue, ro rid: 
3886021956, this.rid: 3886021956
DEBUG Wed Feb 01 2012 17:34:24 GMT+0100 (CET) 
SESSION::950474e4-7c20-41e5-84cd-2a95a7f3c2fe::send_no_requeue:writing 
response: <body xmlns="http://jabber.org/protocol/ht
tpbind"/>
+-------------------------------------------------------------------------------
-------------------------------------------+
| Starting BOSH server 'v0.5.6' on 'http://0.0.0.0:5280/^\/http-bind(\/+)?$/' 
at 'Wed Feb 01 2012 17:35:01 GMT+0100 (CET)' |
+-------------------------------------------------------------------------------
-------------------------------------------+
DEBUG Wed Feb 01 2012 17:35:01 GMT+0100 (CET) Starting the BOSH server

On Nginx :

X.X.X.X - - [01/Feb/2012:17:33:23 +0100] "POST /http-bind/ HTTP/1.1" 200 51 
"http://www.xxxxxxxxxxxx.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0) 
Gecko/20100101 
Firefox/9.0"
X.X.X.X - - [01/Feb/2012:17:33:24 +0100] "POST /http-bind/ HTTP/1.1" 200 329 
"http://www.xxxxxxxxxxxx.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0) 
Gecko/20100101
 Firefox/9.0"
X.X.X.X - - [01/Feb/2012:17:34:24 +0100] "POST /http-bind/ HTTP/1.1" 200 51 
"http://www.xxxxxxxxxxxx.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0) 
Gecko/20100101 
Firefox/9.0"
X.X.X.X - - [01/Feb/2012:17:34:25 +0100] "POST /http-bind/ HTTP/1.1" 502 173 
"http://www.xxxxxxxxxxxx.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0) 
Gecko/20100101
 Firefox/9.0"
X.X.X.X - - [01/Feb/2012:17:34:25 +0100] "POST /http-bind/ HTTP/1.1" 502 173 
"http://www.xxxxxxxxxxxx.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0) 
Gecko/20100101
 Firefox/9.0"
X.X.X.X - - [01/Feb/2012:17:34:33 +0100] "POST /http-bind/ HTTP/1.1" 502 173 
"http://www.xxxxxxxxxxxx.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0) 
Gecko/20100101
 Firefox/9.0"
X.X.X.X - - [01/Feb/2012:17:35:00 +0100] "POST /http-bind/ HTTP/1.1" 502 173 
"http://www.xxxxxxxxxxxx.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0) 
Gecko/20100101
 Firefox/9.0"
X.X.X.X - - [01/Feb/2012:17:36:04 +0100] "POST /http-bind/ HTTP/1.1" 200 124 
"http://www.xxxxxxxxxxxx.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0) 
Gecko/20100101
 Firefox/9.0"
X.X.X.X - - [01/Feb/2012:17:36:04 +0100] "POST /http-bind/ HTTP/1.1" 200 124 
"http://www.xxxxxxxxxxxx.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0) 
Gecko/20100101
 Firefox/9.0"

With 504 :

X.X.X.X - - [01/Feb/2012:18:05:25 +0100] "POST /http-bind/ HTTP/1.1" 504 183 
"http://www.xxxxxxxxxxxx.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0) 
Gecko/20100101
 Firefox/9.0"
X.X.X.X - - [01/Feb/2012:18:05:25 +0100] "POST /http-bind/ HTTP/1.1" 200 51 
"http://www.xxxxxxxxxxxx.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0) 
Gecko/20100101 
Firefox/9.0"
X.X.X.X - - [01/Feb/2012:18:06:24 +0100] "POST /http-bind/ HTTP/1.1" 200 329 
"http://www.xxxxxxxxxxxx.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0) 
Gecko/20100101
 Firefox/9.0"
X.X.X.X - - [01/Feb/2012:18:07:25 +0100] "POST /http-bind/ HTTP/1.1" 504 183 
"http://www.xxxxxxxxxxxx.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:9.0) 
Gecko/20100101
 Firefox/9.0"

Are these some options for configuring Nginx ?
The problem is with Nginx, since I have no problem with Apache.
But how is it that causes a shutdown of Node and I did not have a log to track 
start?

I would like using, in a second time, nginx for load balancing requests

Anyway, thank you for your help.

Original comment by Caez0...@gmail.com on 3 Feb 2012 at 5:08

GoogleCodeExporter commented 9 years ago
Try the config. options mentioned here: 
http://codingteam.net/project/jappix/doc/BoshServer

Also, could you create the log & error file afresh > instead of >> so that we 
know that the logs and errors correspond to the same run.  Or even better, just 
redirect both to the same file for now, so that the correlation is maintained:
$> COMMAND &> bosh.log &

Original comment by dhruvb...@gmail.com on 3 Feb 2012 at 5:22

GoogleCodeExporter commented 9 years ago
I have changed my conf

NXP is ok after 20 minutes
but the 504 error on nginx persist 

2012/02/03 20:33:23 [error] 5563#0: *6 upstream timed out (110: Connection 
timed out) while reading response header from upstream, client: 80.11.36.118, 
server: www.xxxxx.com, request: "POST /http-bind/ HTTP/1.1", upstream: 
"http://127.0.0.1:5280/http-bind/", host: "www.xxxxx.com", referrer: 
"http://www.xxxxx.com/"
2012/02/03 20:34:23 [error] 5563#0: *6 upstream timed out (110: Connection 
timed out) while reading response header from upstream, client: 80.11.36.118, 
server: www.xxxxx.com, request: "POST /http-bind/ HTTP/1.1", upstream: 
"http://127.0.0.1:5280/http-bind/", host: "www.xxxxx.com", referrer: 
"http://www.xxxxx.com/"
2012/02/03 20:36:23 [error] 5564#0: *160 upstream timed out (110: Connection 
timed out) while reading response header from upstream, client: 80.11.36.118, 
server: www.xxxxx.com, request: "POST /http-bind/ HTTP/1.1", upstream: 
"http://127.0.0.1:5280/http-bind/", host: "www.xxxxx.com", referrer: 
"http://www.xxxxx.com/"

I've tested to change proxy_read_timeout ( 
http://forum.nginx.org/read.php?2,4290) to 600s
But after 4 minutes, NXP is crashed and i have 502 errors

In the bosh.log :

DEBUG Fri Feb 03 2012 20:54:12 GMT+0100 (CET) 
BOSH::dfcfc316-9f84-4fb4-b5c5-684aa67464a4::RID: 1509848427, state.RID: 
1509848426
DEBUG Fri Feb 03 2012 20:54:12 GMT+0100 (CET) 
SESSION::dfcfc316-9f84-4fb4-b5c5-684aa67464a4::is_valid_packet::node.attrs.rid:1
509848427, state.rid:1509848426
DEBUG Fri Feb 03 2012 20:54:12 GMT+0100 (CET) 
SESSION::dfcfc316-9f84-4fb4-b5c5-684aa67464a4::setting a timeout of '190' sec
DEBUG Fri Feb 03 2012 20:54:12 GMT+0100 (CET) 
SESSION::dfcfc316-9f84-4fb4-b5c5-684aa67464a4::add_request_for_processing::sessi
on RID: 1509848426
DEBUG Fri Feb 03 2012 20:54:12 GMT+0100 (CET) 
SESSION::dfcfc316-9f84-4fb4-b5c5-684aa67464a4::adding a response object. 
Holding 0 response objects
DEBUG Fri Feb 03 2012 20:54:12 GMT+0100 (CET) 
SESSION::dfcfc316-9f84-4fb4-b5c5-684aa67464a4::process_requests::session RID: 
1509848426
DEBUG Fri Feb 03 2012 20:54:12 GMT+0100 (CET) 
SESSION::dfcfc316-9f84-4fb4-b5c5-684aa67464a4::updated RID to: 1509848427
DEBUG Fri Feb 03 2012 20:54:12 GMT+0100 (CET) 
SESSION::dfcfc316-9f84-4fb4-b5c5-684aa67464a4::_process_one_request::session 
RID: 1509848427, stream: true
DEBUG Fri Feb 03 2012 20:54:12 GMT+0100 (CET) 
SESSION::dfcfc316-9f84-4fb4-b5c5-684aa67464a4::send_pending_responses::state.pen
ding.length: 0
DEBUG Fri Feb 03 2012 20:54:12 GMT+0100 (CET) 
SESSION::dfcfc316-9f84-4fb4-b5c5-684aa67464a4::_stitch_new_response::len::1::nex
t_stream::0
DEBUG Fri Feb 03 2012 20:55:12 GMT+0100 (CET) 
SESSION::dfcfc316-9f84-4fb4-b5c5-684aa67464a4::send_no_requeue, ro valid: true
DEBUG Fri Feb 03 2012 20:55:12 GMT+0100 (CET) 
SESSION::dfcfc316-9f84-4fb4-b5c5-684aa67464a4::send_no_requeue, ro rid: 
1509848427, this.rid: 1509848427
DEBUG Fri Feb 03 2012 20:55:12 GMT+0100 (CET) 
SESSION::dfcfc316-9f84-4fb4-b5c5-684aa67464a4::send_no_requeue:writing 
response: <body xmlns="http://jabber.org/protocol/httpbind"/>

I have nothing on the bosh.err

Thanks

Original comment by s...@novetys.com on 3 Feb 2012 at 8:14

GoogleCodeExporter commented 9 years ago
Hi,

Have you got any other idea for my issue ?

Regards, 

Original comment by Caez0...@gmail.com on 6 Feb 2012 at 4:35

GoogleCodeExporter commented 9 years ago
Hello! I've tried working with the information I have, but it's hard to do any 
debugging without the crash log! It would be great if you could somehow get a 
crash log since it would help a lot.

Original comment by dhruvb...@gmail.com on 16 Feb 2012 at 11:10

GoogleCodeExporter commented 9 years ago
Maybe this is helpful: 
http://anders.conbere.org/blog/2011/05/03/get_xmpp_-_bosh_working_with_ejabberd_
firefox_and_strophe/

Original comment by dhruvb...@gmail.com on 16 Feb 2012 at 11:15

GoogleCodeExporter commented 9 years ago
My problem, I don't have crash log.

I have updated Node in 0.6 and Nginx in 1.0.12

I have launched node-xmpp in a screen, and i have the same problem :

[2012-02-27 12:14:23.228] [DEBUG] [response.js] - SENT: <body 
xmlns="http://jabber.org/protocol/httpbind"/>
[2012-02-27 12:14:23.463] [DEBUG] [http-server.js] - RECD: <body 
rid="2773113391" xmlns="http://jabber.org/protocol/httpbind" 
sid="4d066eee-63df-4114-95bf-643840445fa4"/>
[2012-02-27 12:15:08.498] [DEBUG] [response.js] - SENT: <body 
xmlns="http://jabber.org/protocol/httpbind"/>
[2012-02-27 12:15:09.354] [DEBUG] [http-server.js] - RECD: <body 
rid="2773113392" xmlns="http://jabber.org/protocol/httpbind" 
sid="4d066eee-63df-4114-95bf-643840445fa4"/>
[2012-02-27 12:15:54.390] [DEBUG] [response.js] - SENT: <body 
xmlns="http://jabber.org/protocol/httpbind"/>
master web bosh # 

Node-xmpp is stopperd and i don't have error :/
With Apache, no problem.

Can you give me an alternative ?
Currently I use Openfire, and I can access multiple web servers in load 
balancing

My problem is with the persistence or when the crash of one of my web server, 
which causes the client to disconnect
So, i would like to use nginx-sticky-module 

Have you an alternative to  nginx-sticky-module? 

Regards,

Original comment by Caez0...@gmail.com on 27 Feb 2012 at 11:28

GoogleCodeExporter commented 9 years ago
Does it crash if you run it w/o nginx OR apache as the reverse proxy?

You can try this node.js based http-proxy as an alternative for load balancing: 
http://thechangelog.com/post/872114581/node-http-proxy-reverse-proxy-for-node-js

Original comment by dhruvb...@gmail.com on 27 Feb 2012 at 12:34

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
No, it crashes only with Nginx.

With Apache my node-xmpp is limited to 500 users.
When i have more users in one instance, node is crashing

I have 3k users connected on my chat. In order to have stability, we have 
installed node by web server.
With Http-proxy, it doesn't have sticky, so with a load balancing my users will 
be  disconnected every time 

If it's possible to run one instance of node-xmpp with 3k (and more) users, my 
problem would be solved...for a time, I'll just put a failover in the event of 
node crash.

Original comment by Caez0...@gmail.com on 27 Feb 2012 at 5:03

GoogleCodeExporter commented 9 years ago
@Satyam, what version of nginx are you using NXB with? Maybe it's a version 
thing...

I have tested NXB with 5k users but that was in a synthetic setting. I am told 
that in a real-world setting, NXB can handle about 2k users tops. This is due 
to various reasons such as NXB being a single process, v8 memory restrictions, 
etc... The way to scale (this applies to any web-service) is to scale-out.

IMHO, it should be fairly simple to add cookie based stickiness to 
node-http-proxy. You could try opening an issue. There seems to be something 
relevant here: https://github.com/nodejitsu/node-http-proxy/issues/5

Either ways, the crash with nginx needs to be investigated.

Original comment by dhruvb...@gmail.com on 27 Feb 2012 at 5:13

GoogleCodeExporter commented 9 years ago
What is the request timeout that you have set? You need to increase it to 
at-least the wait time -- which is usually 3600(60 * 60) secs. That will get 
rid of your timeouts (502).

Just add this rule: 

proxy_read_timeout 3600;

Original comment by satyamsh...@gmail.com on 28 Feb 2012 at 7:16

GoogleCodeExporter commented 9 years ago
I have tried Nginx 1.0.11 and 1.0.12 with passenger for rails

For the proxy timeout i have tried some values.
If i put 60s, I have many 504 errors and NXB crash after 30 - 40 minutes
If i change this value, NXB crash after 2 - 3 minutes

The problem is clearly related to Nginx but I do not understand why I did not 
trace the crash

Is there a level above LOG enabling debug may be to have a little more traces?

what version of nginx did you use for your tests?

For Nginx :

./configure --prefix='/usr/local/nginx' --with-http_ssl_module 
--with-cc-opt='-Wno-error' 
--add-module='/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.11/ext/nginx' 
--conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log 
--http-client-body-temp-path=/var/lib/nginx/body 
--http-fastcgi-temp-path=/var/lib/nginx/fastcgi 
--http-log-path=/var/log/nginx/access.log 
--http-proxy-temp-path=/var/lib/nginx/proxy 
--http-scgi-temp-path=/var/lib/nginx/scgi 
--http-uwsgi-temp-path=/var/lib/nginx/uwsgi --lock-path=/var/lock/nginx.lock 
--pid-path=/var/run/nginx.pid 
--add-module=/usr/local/src/masterzen-nginx-upload-progress-module-436ec80

Original comment by Caez0...@gmail.com on 28 Feb 2012 at 9:18

GoogleCodeExporter commented 9 years ago
We are running on nginx version: nginx/0.8.54.

Lets not worry about NXB crash right now and get your nginx to work with it. 
NXB shouldn't crash no matter what. Can you copy paste the exact nginx.conf you 
have in place for NXB?

Original comment by satyamsh...@gmail.com on 28 Feb 2012 at 9:59

GoogleCodeExporter commented 9 years ago
As for log level - in v0.6 we have trace log level that add slightly to the 
debug log level. Turn that on as well - it will tell you at what path is the 
client trying to access NXB.

Original comment by satyamsh...@gmail.com on 28 Feb 2012 at 10:01

GoogleCodeExporter commented 9 years ago
Hi,

I have tested the trace Level but i don't have any trace.
When we are 2 connected on the chat, NXP is up more time.

With Apache, if one member is connected we don't have any problem.

I'll try to recompile an old version of Nginx to see if the problem occurs again

Original comment by s...@novetys.com on 3 Mar 2012 at 11:16

GoogleCodeExporter commented 9 years ago
Hi,

With the latest version of Nginx, node and Node-Xmpp, i don't have problem.

Thanks for your help.

Original comment by Caez0...@gmail.com on 29 Mar 2012 at 3:05

GoogleCodeExporter commented 9 years ago
Great!

Original comment by dhruvb...@gmail.com on 29 Mar 2012 at 4:48

GoogleCodeExporter commented 9 years ago
Are there any implications to making the timeout some exceedingly large value?  

Original comment by coreyau...@gmail.com on 3 Jan 2014 at 11:45

GoogleCodeExporter commented 9 years ago
Yes - you are susceptible to attacks where the attacker holds an open 
connection and forces you to run out of file descriptors.

Original comment by dhruvb...@gmail.com on 4 Jan 2014 at 2:42