xmppo / node-xmpp-bosh

An XMPP BOSH & WebSocket server (connection manager) written on node.js using Javascript
https://github.com/xmppo/node-xmpp-bosh
263 stars 85 forks source link

Huge memory leak when using node-xmpp-bosh extensively #8

Closed gooooer closed 12 years ago

gooooer commented 12 years ago

Hi.

I was using node-xmpp-bosh server with Strophe-1.0.2 library on client side. Generally it was working OK but there was one issue: after some time the connection to node-xmpp-bosh server is lost and client doesn't know about that. So messages didn't transfer to the target and client's contact became offline(from target's point of view) too.

I found the thread where similar problem was discussed and solved: http://groups.google.com/group/strophe/browse_thread/thread/411e17f93859d915 . I used this Strophe plugin https://github.com/zanchin/strophejs/blob/master/plugins/plugin.cm.js and eventually messages transferring became reliable. But this lead to another problem - server's PC goes out of memory after ~5 hours of work and the only way to fix this is restart bosh server. I also noticed that after integrating that Strophe plugin the numbers at "X/Y active sessions" are growing faster. Typycally I have 10-12 active sessions (simultaneously connected clients) and Y number is growing very fast. E.g. like +10 used sessions per 5 minutes. So it looks like aggressive connection manager plugin which I took from https://github.com/zanchin/strophejs/blob/master/plugins/plugin.cm.js is breaking node-xmpp-bosh server.

I'm not a JavaScript expert, my level of knowledge rather very low. I'm just trying to use existing solution and I would be very happy if you can help me with that.

Please let me know if I can help you with solving this issue in any way.

Thanks, Andrey.

satyamshekhar commented 12 years ago

Hey,

I was using node-xmpp-bosh server with Strophe-1.0.2 library on client side. Generally it was working OK but there was one issue: after some time the connection to node-xmpp-bosh server is lost and client doesn't know about that. So messages didn't transfer to the target and client's contact became offline(from target's point of view) too.

Do you have some kind of runner script that restarts the bosh-server in case it crashes/exits?

I also noticed that after integrating that Strophe plugin the numbers at "X/Y active sessions" are growing faster.

What is the max number of active of streams/sessiona that you saw?

Which version on node-xmpp-bosh are you using and what is the configuration of your server?

Thanks for reporting.

Satyam

gooooer commented 12 years ago

Hola,

thank you for quick reply!

I'm using forever tool which keeps bosh server up. Startup command is "/usr/bin/forever start /usr/bin/bosh-server" . I'm also running killer script which kills node-xmpp-bosh each 3 hrs:

while [ true ]; do   
    sleep 10800      
    BOSH_PID=`pgrep node | tail -n 1`
    echo "killing BOSH_PID: " $BOSH_PID   
    kill -9 $BOSH_PID      
done

It helps to workaround the problem.

The maximum number of X/Y was something like 20/2049 after 3 hours of work. You can see current usage at http://kvm.vv-master.com:5280/http-bind . It's quite inactive now, typically it much more aggressive.

I'm using node-xmpp-bosh of version 0.5.6 . You can see my node-xmpp-bosh config below:

exports.config = {
port: 5280, 
host: '91.214.249.200', 
path: /^\/http-bind(\/+)?$/, 
logging: 'INFO', 

// The maximum number of bytes that the BOSH server will 
// "hold" from the client
max_data_held: 500000, 

// Terminate the session if the XMPP buffer for a stream 
// exceeds max_xmpp_buffer_bytes bytes
max_xmpp_buffer_size: 1000000, 

// Don't entertain more than 'max_bosh_connections' simultaneous 
// connections on any BOSH session. This is related to the 'hold'
// attribute
max_bosh_connections: 4, 

// The maximum number of packets on either side of the current 'rid'
// that we are willing to accept.
window_size: 15, 

// How much time (in second) should we hold a response object 
// before sending and empty response on it?
default_inactivity: 70, 

max_inactivity: 160, 

// The value (in second) of keepalive to set on the HTTP response 
// socket
http_socket_keepalive: 60, 

// The maximum number of active streams allowed per BOSH session
max_streams_per_session: 100, 

http_headers: { }, 

// 
// A list of Domains for which TLS should NOT be used 
// if the XMPP server supports STARTTLS but does NOT
// require it.
// 
// See this link for details:
// http://code.google.com/p/node-xmpp-bosh/issues/detail?id=11
// 
no_tls_domains: [ /* 'chat.facebook.com' */ ], 

// Set to 'true' if you want:
// 
// 1. The session creation response to contain the <stream:features/> tag.
// 2. NO multiple streams support (only supports a single stream
// per session in this mode).
// 
// Useful to work around a pidgin (libpurple) bug.
// 
pidgin_compatible: true

};

Any advice? Thanks, Andrey.

satyamshekhar commented 12 years ago

Hey,

That is pretty interesting. Usually, a process is killed by itself if it hogs too much of memory. You shouldn't have had to kill the process. It should have died it self with "FATAL ERROR: JS Allocation failed - process out of memory" error.

20 streams are pretty less for too much memory. What is the configuration of your server? Also, do you know how much memory was being used up by to node-xmpp-bosh?

Am not sure if it will solve your issue, but you can try updating to v0.6.0 and see if it helps.

Thanks.

dhruvbird commented 12 years ago

What is the node.js version you are using? The output of:

$ node --version

and:

$ node --vars
gooooer commented 12 years ago

Hi,

# node --version
v0.4.7

# node --vars
NODE_PREFIX: /usr
NODE_CFLAGS: -rdynamic -D_GNU_SOURCE -DHAVE_CONFIG_H=1 -pthread -g -O3 -DHAVE_OPENSSL=1 -DEV_FORK_ENABLE=0 -DEV_EMBED_ENABLE=0 -DEV_MULTIPLICITY=0 -DX_STACKSIZE=65536 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DEV_MULTIPLICITY=0 -DHAVE_FDATASYNC=1 -DPLATFORM="linux" -D__POSIX__=1 -Wno-unused-parameter -D_FORTIFY_SOURCE=2 -DNDEBUG -I/usr/include/node

As for server configuration:

# cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 26
model name  : Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
stepping    : 5
cpu MHz     : 2672.732
cache size  : 8192 KB
fpu     : yes
fpu_exception   : yes
cpuid level : 4
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 syscall lm constant_tsc pni ssse3 cx16 sse4_1 sse4_2 lahf_lm
bogomips    : 5345.46
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 26
model name  : Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
stepping    : 5
cpu MHz     : 2672.732
cache size  : 8192 KB
fpu     : yes
fpu_exception   : yes
cpuid level : 4
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 syscall lm constant_tsc pni ssse3 cx16 sse4_1 sse4_2 lahf_lm
bogomips    : 5347.49
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model       : 26
model name  : Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
stepping    : 5
cpu MHz     : 2672.732
cache size  : 8192 KB
fpu     : yes
fpu_exception   : yes
cpuid level : 4
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 syscall lm constant_tsc pni ssse3 cx16 sse4_1 sse4_2 lahf_lm
bogomips    : 5344.24
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model       : 26
model name  : Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
stepping    : 5
cpu MHz     : 2672.732
cache size  : 8192 KB
fpu     : yes
fpu_exception   : yes
cpuid level : 4
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 syscall lm constant_tsc pni ssse3 cx16 sse4_1 sse4_2 lahf_lm
bogomips    : 5357.17
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

# cat /proc/meminfo 
MemTotal:      1013724 kB
MemFree:        249680 kB
Buffers:        149080 kB
Cached:         289036 kB
SwapCached:      94344 kB
Active:         529092 kB
Inactive:       163620 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      1013724 kB
LowFree:        249680 kB
SwapTotal:     1052248 kB
SwapFree:       911672 kB
Dirty:              52 kB
Writeback:           0 kB
AnonPages:      247528 kB
Mapped:          19656 kB
Slab:            44508 kB
PageTables:      10068 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   1559108 kB
Committed_AS:   627148 kB
VmallocTotal: 34359738367 kB
VmallocUsed:       756 kB
VmallocChunk: 34359737607 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB
dhruvbird commented 12 years ago

We think there is a problem with 0.4.7's GC - others using this version have also reported the issue. When we run it on 0.4.12 though, it seems to be fine. Any chance you could upgrade node.js to 0.4.12 and see if that works for you?

gooooer commented 12 years ago

Sure I can upgrade the node.js. Thank you very much for advice! I'll report the results of testing in 1day.

gooooer commented 12 years ago

Hi guys,

sorry for delay, we tried few other paths to get jabber messaging working. Finally we've updated the node.js and the problem has gone!

Issue can be closed.

Thank you very much for help! Andrey.