wandenberg / nginx-push-stream-module

A pure stream http push technology for your Nginx setup. Comet made easy and really scalable.
Other
2.21k stars 295 forks source link

Memory leaks in production using ... #9

Closed JohnBat26 closed 12 years ago

JohnBat26 commented 12 years ago

Hi all. We are using nginx-push-stream-module.

And we get next errors:

2011/12/11 18:52:23 [crit] 24244#0: ngx_slab_alloc() failed: no memory 2011/12/11 18:52:23 [error] 24244#0: *73289555 push stream module: unable to allocate worker subscriber queue marker in shared memory, client: 213.85.187.19, server: 172.17.0.4, request: "GET /xxx/v1/sub/106q2chk9s1ld1fd3m1hx9vuib" 2011/12/11 18:52:23 [crit] 24244#0: ngx_slab_alloc() failed: no memory

2011/12/11 18:52:23 [error] 24244#0: *73289563 push stream module: unable to allocate worker subscriber queue marker in shared memory, client: 213.85.187.19, server: 172.17.0.4, request: "GET /xxx/v1/sub/34lgiys1st5ip0t0qkw81gcr?rnd=0.02243080991320312"

In view of the memory leaks in the push-module nginx in production, I spent a load testing comet. The test is written in Java. It creates a thread pool of 500 threads. Each of them is connected to the comet-channel. each time to the unique. The test worked through the night.

Now the results.

~ ENVIRONMENT: Platform: x86-64. uname-a: Linux server1 2.6.18-194.32.1.el5 # 1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU / Linux 4 processor with two cores in each = 8 nuclei. Version of nginx: 1.1.7. Version push_module: 0.3.1.

~ MEMORY CONSUMPTION No-load each nginx worker consumes ~ 42 248 KB c 500 compounds attached worker nginx-a consumption: 45 080 KB Hence, at 500 connections took 45 080 - 42 248 = 3560 KB Therefore, one compound is spent, empirically, 3560/500 ~ 7 kb. That is, on 50000 compounds have only ~ 350 mb.

Questions: Exactly how much memory is consumed on one channel? Exactly how much memory is consumed on a single connection? Exactly how much system memory is necessary for 50,000 connections?

The test worked through the night. Timeout on nginx stood at 1 minute. That is, flows constantly reconnect to random channels. By morning no leaks were found. all workers consume memory> 50 MB.

Hence comes the idea that the problem could be in production at the level of TCP / IP. We are writing middleware for IPTV. Clients use SetTopBoxes to connect. Maybe the operating system at STBs disconnects from nginx don't correct?

Question: What Linux kernel configuration recommened for use push_module? We are all time-outs are reduced to a minimum. TIMEWAITS connections should not be much.

I want to get tcpdump from our clients and publish its here.

I hope the above information will help to eliminate memory leaks.

P.S. cat /proc/net/sockstat sockets: used 829 TCP: inuse 658 orphan 0 tw 0 alloc 662 mem 504 UDP: inuse 12 mem 0 RAW: inuse 0 FRAG: inuse 0 memory 0

wandenberg commented 12 years ago

Hi Eugene,

Could you send me your configuration file? How much memory you set to shared memory?

I'm still investigating the way Nginx reuse memory, but it's not so easy.

I owe an estimate of how much memory each request costs in shared and no shared memory, sorry. I was very busy at work.

If you can repeat your tests using the last commit on github, please. There are some important changes since 0.3.1 version.

On Mon, Dec 12, 2011 at 6:28 AM, Eugene < reply@reply.github.com

wrote:

Hi all. We are using nginx-push-stream-module.

And we get next errors:

2011/12/11 18:52:23 [crit] 24244#0: ngx_slab_alloc() failed: no memory 2011/12/11 18:52:23 [error] 24244#0: 73289555 push stream module: unable to allocate worker subscriber queue marker in shared memory, client: 213.85.187.19, server: 172.17.0.4, request: "GET /xxx/v1/sub/106q2chk9s1ld1fd3m1hx9vuib" 2011/12/11 18:52:23 [crit] 24244#0: ngx_slab_alloc() failed: no memory 2011/12/11 18:52:23 [error] 24244#0: 73289563 push stream module: unable to allocate worker subscriber queue marker in shared memory, client: 213.85.187.19, server: 172.17.0.4, request: "GET

/xxx/v1/sub/34lgiys1st5ip0t0qkw81gcr?rnd=0.02243080991320312"

In view of the memory leaks in the push-module nginx in production, I spent a load testing comet. The test is written in Java. It creates a thread pool of 500 threads. Each of them is connected to the comet-channel. each time to the unique. The test worked through the night.

Now the results.

~ ENVIRONMENT: Platform: x86-64. uname-a: Linux server1 2.6.18-194.32.1.el5 # 1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU / Linux 4 processor with two cores in each = 8 nuclei. Version of nginx: 1.1.7. Version push_module: 0.3.1.

~ MEMORY CONSUMPTION No-load each nginx worker consumes ~ 42 248 KB c 500 compounds attached worker nginx-a consumption: 45 080 KB Hence, at 500 connections took 45 080 - 42 248 = 3560 KB Therefore, one compound is spent, empirically, 3560/500 ~ 7 kb. That is, on 50000 compounds have only ~ 350 mb.

Questions: Exactly how much memory is consumed on one channel? Exactly how much memory is consumed on a single connection? Exactly how much system memory is necessary for 50,000 connections?

The test worked through the night. Timeout on nginx stood at 1 minute. That is, flows constantly reconnect to random channels. By morning no leaks were found. all workers consume memory> 50 MB.

Hence comes the idea that the problem could be in production at the level of TCP / IP. We are writing middleware for IPTV. Clients use SetTopBoxes to connect. Maybe the operating system at STBs disconnects from nginx don't correct?

Question: What Linux kernel configuration recommened for use push_module? We are all time-outs are reduced to a minimum. TIMEWAITS connections should not be much.

I want to get tcpdump from our clients and publish its here.

I hope the above information will help to eliminate memory leaks.

P.S. cat /proc/net/sockstat sockets: used 829 TCP: inuse 658 orphan 0 tw 0 alloc 662 mem 504 UDP: inuse 12 mem 0 RAW: inuse 0 FRAG: inuse 0 memory 0


Reply to this email directly or view it on GitHub: https://github.com/wandenberg/nginx-push-stream-module/issues/9

JohnBat26 commented 12 years ago

letter sended to wandenberg@gmail.com

wandenberg commented 12 years ago

Hi Eugene,

I just upgraded the repository with some changes related to memory consumption. Try this new version if you can.

Regards, Wandenberg

On Mon, Dec 12, 2011 at 10:36 AM, Eugene < reply@reply.github.com

wrote:

letter sended to wandenberg@gmail.com


Reply to this email directly or view it on GitHub:

https://github.com/wandenberg/nginx-push-stream-module/issues/9#issuecomment-3105427

wandenberg commented 12 years ago

Hi Eugene,

did you tested the new version? May I close the issue?

JohnBat26 commented 12 years ago

Hello. No. We have the holydays up to 10th January. We will test this fix very deeply soon.

Sorry.

On Суббота 07 января 2012 13:48:56 Wandenberg Peixoto wrote:

Hi Eugene,

did you tested the new version? May I close the issue?


Reply to this email directly or view it on GitHub:

https://github.com/wandenberg/nginx-push-stream-module/issues/9#issuecomment-3398496

Best regards, Eugene Batogov.

JohnBat26 commented 12 years ago

I sent email with result our load test.

wandenberg commented 12 years ago

Hi,

when you can repeat your tests with version 0.3.3 there has a bug fix to prevent memory leak

wandenberg commented 12 years ago

Hi did you have an opportunity to test the new version?

JohnBat26 commented 12 years ago

Hi. We have installed nginx-1.1.17 with last push_stream_module in production. Memory leak exist. But we think that problem in wrong reconnect mechanism in our js-code.

Soon we will be install new patch for our product in production and send feedback about status of this problem. Thanks for support.

wandenberg commented 12 years ago

Hi,

any news about his?

JohnBat26 commented 12 years ago

Hi! I sent email to you with current details

wandenberg commented 12 years ago

Hi,

finally I found the problem. In fact I found two memory leaks. The first one happens when message_ttl is not set, message will be kept on server forever. Now this directive has a default value as 30 minutes.

The second is really a bug. When a channel don't have any subscribers or messages, it is moved to a "trash tree" before has its memory free. If this channel is used again in a short time, it is recovered from trash, but part of its structure is wrongly discarded without free the memory.

Please get latest updates on master branch and do your tests again.

Regards, Wandenberg

JohnBat26 commented 12 years ago

Hi. Thanks you very match for your fix. We tested new build in our lab and don't find any problem with memory leak. Soon we will test new build in production. About result I shall write additionally.

But I think, that this leak has been resolved now. ;)

wandenberg commented 12 years ago

Hi Eugene,

did you detected any other problem?

JohnBat26 commented 12 years ago

Hi No. Now we don't have any problem.