Closed natnet00 closed 4 years ago
Can you provide a step by step on how to reproduce the problem?
1) Compile nginx with push-stream-module and the option --with-http_stub_status_module 2) Start nginx and check the numbers of http://localhost/stub_status It will return something like: Active connections: 0 server accepts handled requests 152822 152822 311071 Reading: 0 Writing: 0 Waiting: 0
3) Note the Reading/Writing/Waiting numbers and the active connections number Cross check the active connections number with the count of established connections from the command "netstat -natp | grep nginx" You should see as many lines with the word "ESTABLISHED" as you have active connections in nginx.
4) Send requests to the nginx-push-stream-module, subscribe to channels, get messages, etc. 5) If you check http://localhost/stub_status again you should see active connections going up. This is correct. 6) Shut down the connections to the push-stream-module (e.g. by closing your browser) but don't restart nginx. 7) Note that the output of http://localhost/stub_status is now wrong. It still shows active connections. Cross-check with "netstat -natp | grep nginx" again. This output is directly from Linux kernel and shows the true established connections. Note that the number of lines with the keyword "ESTABLISHED" does not match the output of nginx.
I think the cause is that the variables ngx_stat_writing and ngx_stat_active are not updated correctly in nginx.
In https://github.com/nginx/nginx/blob/79fcf261d0b50c03ae2780b5588b59ed2eb7ad88/src/http/ngx_http_request.c the variable ngx_stat_writing is incremented or decremented when a request is processed or the request is freed again.
Maybe you need to notify nginx somewhere that a request is finished and the resources can be freed.
@natnet00 I tried to reproduce the problem without success. If the module was not notifying the nginx that the request is finished the counters would never be decreased. So I believe that, if the problem is on push stream module, it only happens at some conditions. If you can help me providing access to a server that the problem happens or create a virtual machine where we can always and easily reproduce the problem would be great.
Thank you for your quick response.
Unfortunately I cannot provide you access to the servers, as they are in production and I don't have an online test system, but I can give you the configuration:
The subscription URL for the nginx-push-stream endpoint is a php script that checks if the user is authenticated and either sends 403 response or sends header("X-Accel-Redirect: /xxx/ws/...") redirect header to the actual nginx path.
The nginx path is configured like in the attached file location.txt location.txt
I'm using a TCP keepalive module - this only closes the connection after about 16 minutes of no response from the client.
The pushstream config is like this: pushstream.txt
I tested with many thousand requests and checked after ~ 24 hours. In the chart below is my test - 1 day with requests - after all requests have finished the lines are not where they used to be...
I have one more hint: it seems that only the number of writing connections and the sum of all, the number of active connections are wrong.
Hi @natnet00
I still not able to reproduce the problem locally. Can you help me to create a setup like yours with an application with php handling authorization returning a X-Accel-Redirect? I have a suspicion that this may be causing this kind of strange problem. In time, are you having problem with some workers dying? This can also cause the problem with wrong numbers on stub_status module.
With nginx 1.11.3, I compiled the push stream module on 2016-07-26, and have not had any incorrect values for almost 2 weeks. So, I guess, the bug seems to be fixed somehow.
But now I tried https://github.com/vozlt/nginx-module-vts and this module also seems to mess up the connection count. But that's another story ;)
Hello,
When reading the active connection count from the stub_status module, it gives me something like: Active connections: 2 Reading: 0 Writing: 1 Waiting: 1
After having users send requests to the server while using the push-stream-module these numbers rise as expected to a couple of hundreds. But: Despite using TCP keepalive these numbers don't decrease to their original values after tellling the router to redirect users to a different server.
That is: Currently there are no users using the server, but the numbers read: Active connections: 8 Reading: 0 Writing: 2 Waiting: 1
Active connections should be the sum of all three (reading+writing+waiting). I suspect that the push-stream-module somehow breaks the correct counting of active connections. The numbers after using the push-stream-module are wrong. "netstat -natp | grep nginx" shows me that there is only 1 established connection to nginx, while nginx reports 8 active connections (2 writing).
Maxim Dounin says at http://permalink.gmane.org/gmane.comp.web.nginx.english/33947 that this type of problem might be due to a 3rd party module. That's why I suspect the nginx-push-stream-module to be the cause.
It would be great if this could be fixed.
This is the source of the Stub Status Module, just for reference: https://github.com/nginx/nginx/blob/master/src/http/modules/ngx_http_stub_status_module.c
Nginx version used: 1.9.4 Push Stream Module used: latest github version
The problem exists since at least 1 year.