wandenberg / nginx-push-stream-module

A pure stream http push technology for your Nginx setup. Comet made easy and really scalable.
Other
2.22k stars 295 forks source link

some many fatal bug with websocket。 #209

Closed yangco closed 8 years ago

yangco commented 9 years ago
  1. Message lost,when send quickly 。
  2. Receive data is Garbled。
  3. Lots of subscriber cause cpu 100% .... very unstable! 坑爹啊!花了两个星期研究,一堆BUG,根本没法用。
wandenberg commented 9 years ago

Could you check if the branch debug_issue_202 reproduce the problem? Make sure that in the logs you see the message DEBUG #202 version 1. on startup.

yangco commented 9 years ago

new BUG.

  1. first bug fix. if (ctx->frame->opcode == NGX_HTTP_PUSH_STREAM_WEBSOCKET_CLOSE_OPCODE) { ngx_http_push_stream_send_response_finalize(r); return;// move here } else { ctx->frame->step = NGX_HTTP_PUSH_STREAM_WEBSOCKET_READ_START_STEP; ctx->frame->last = NULL; } // return; //must remove
  2. second bug fix recv length's paramter is wrong.
  3. Third bug no answer.
  4. worker_msg->msg->expires = ngx_time() + NGX_HTTP_PUSH_STREAM_DEFAULT_SHM_MEMORY_CLEANUP_OBJECTS_TTL;

//Will delete using msg.

wandenberg commented 9 years ago

Sorry, I didn't understand anything from your previous message.

Could you test the branch I pointed? I'm already working to solve the bug reported on #202

yangco commented 9 years ago

haha, my english is not good。

202 is the second BUG。

There is two reason for #202 BUG。

  1. xxx_recv's len parameter is wrong。 I already fix it.
  2. free memory too early. I temporary fix it(Augment msg->expires.), But it is not a good solution。
wandenberg commented 9 years ago

I still not getting your point. Where and why do you think recv is called with wrong value? Which changes have you made? The memory for message is released after it was sent to "garbage queue" and the ttl expires. I don't think that is being released early. Show me the lines you think that have problems. Have you tested the code I mentioned ?

yangco commented 9 years ago

ngx_http_push_stream_module_websocket.c

206 if ((rc = ngx_http_push_stream_recv(c, rev, &buf, 2)) != NGX_OK) { // error if ((rc = ngx_http_push_stream_recv(c, rev, &buf, (ssize_t)(2 - (buf.last - buf.start)))) != NGX_OK) { //right 207 goto exit; 208 } .....

“released early." is too complex i can not make oneself clear. it is exist.

//you mean this code? if ((rc == NGX_OK) && (in != NULL) && (ctx = ngx_http_get_module_ctx(r, ngx_http_push_stream_module)) != NULL) {

I tested, nothing.

wandenberg commented 9 years ago

The change you did on 206 line does not make sense. Each time the code reach this line the but.last is equals to buf.start, so it will result in (ssize_t)(2 - 0) which is equals to 2. Try to add some log on that line and you will see what I mean. And please, test the code I suggest you. Will be easier if all of us are working on the same spot

yangco commented 9 years ago

ai, No 206 line, all recv is wrong. Sometime socket will recv 1 byte, then buf.last is not equals buf.start. ”粘包“ english perhaps mean "package splicing".

I already tested the code you suggest. Noting happend, bug still exist.

yangco commented 9 years ago

How do you know release message(buffer) is send OK? ngx_http_push_stream_output_filter is sending not sent. After call ngx_http_push_stream_output_filter you put "sending message" to "garbage queue" and expired(after 10s).
If there are many many messages,timer will release messages(buffer also release) before send ok。

wandenberg commented 9 years ago

Hi @yangco,

regarding the problem with recv I was able to reproduce the problem and fix it. I will add a commit as soon as I finish the automated test.

About the message release, while sending the message its content should be copied to the socket buffer. If you have slow connections you should consider increase the output buffer of your sockets. With that the message content is copied to it subscriber socket and can be released from shared memory. As far as I know there is no way to check if the message was delivered the only think I can check is if it was copied to the socket. But to simplify the module we don't do that. To do this we would have to control to which socket the message was copied or not, and other things. Is easier just increase the socket buffers size if you have slow connections but a high number of published messages.

wandenberg commented 9 years ago

The 47ec083e0079d3eb9a96b52d9f58e8f54e2e8e3d mus fix the recv problem. Can you give it a try?

yangco commented 9 years ago

Thank for you reply. I fixed bug and make a stress test. Linux 2.6.32-431.el6.x86_64, websocket ,epoll,1 worker(1 CPU), Result: 1000 subscribers, 10 MSG/S. CPU <20%, (GOOD). 5000 subscribers, 10 MSG/S. CPU 100%, (My expectation is <50% ), Sometimes suspended. This performance can not meet my requirements. So I begin to write a simple module only websocket. By the way, It seems very good under macosx use kqueue. If you know Java, netty websocket example can be used as a tester.

wandenberg commented 9 years ago

Can you share your stress test code? I can work to find the bottleneck. Maybe something specific with your configuration/environment.

yangco commented 9 years ago

I already mail the code.

wandenberg commented 8 years ago

Just to finish the issue... The "garbage messages" were fixed on 47ec083, and regarding the stress test I've reported to @yangco that was at least one problem on the stress test tool on Sep 27, without any answer, so I'm closing the issue for now.