Open zavada opened 10 years ago
@zavada What nginx variable are you using for the "request time" field in your access log? Is it $request_time
? If yes, then the time specified by the $request_time
variable includes the network I/O time between the client and your web server, which can be relatively large for slow downstream networks.
@agentzh Yes, I use $request_time
.
But if I comment the line
local categories, err = red:smembers("categories:"..clear_url)
I don't have in the result this >100ms time requests. Thats why I think that problem is not in network I/O time between the client and my web server.
Maybe do you have other idea?
@zavada Then you'll have to use tools like systemtap and dtrace to grab more details from those slow request samples in various places: nginx processes, kernel network stack, and the redis-server process. See my sample tools based on systemtap in the nginx-systemtap-toolkit and stapxx projects on GitHub for some ideas.
@zavada Also, $request_time
may have some error due to the time caching mechanism in the nginx core. Use the ngx-lua-tcp-recv-time
tool in my stapxx project to measure the upstream cosocket recv latency time accurately at real time:
@zavada Furthermore, watch the CPU usage of your nginx worker processes, if their CPU usage can hit 100% from time to time, then you might just run out of CPU time and such intermittent long latency is totally reasonable. Similarly, you should watch the CPU usage of your redis-server process as well.
If any process is exhausting the CPU time, then you should use the various flame graph sampling tools to analyze the bottleneck:
https://github.com/openresty/nginx-systemtap-toolkit#sample-bt
https://github.com/openresty/stapxx#lj-lua-stacks
Regards, -agentzh
@zavada Sometimes, nginx may just block on some blocking I/O system calls (like file I/O) or semaphores used as some internal locks in nginx. Such blocking things can contribute to your cosocket latency as well. You can just the epoll-loop-blocking-distr tool to verify this:
https://github.com/openresty/stapxx#epoll-loop-blocking-distr
and further use the off-CPU time flame graph tool to analyze the causes:
https://github.com/openresty/nginx-systemtap-toolkit#sample-bt-off-cpu
@agentzh When I try to run your examples I have next error Checking "/lib/modules/2.6.32-431.el6.x86_64/build/.config" failed with error: No such file or directory On my centos server I have "/lib/modules/2.6.32-431.el6.x86_64/build/" but don't ".config" inside. Do you know how I can fix it?
Thank you for your time and help.
Hello!
On Sun, Oct 12, 2014 at 2:34 AM, zavada wrote:
@agentzh When I try to run your examples I have next error Checking "/lib/modules/2.6.32-431.el6.x86_64/build/.config" failed with error: No such file or directory On my centos server I have "/lib/modules/2.6.32-431.el6.x86_64/build/" but don't ".config" inside. Do you know how I can fix it?
Have you installed the "kernel-devel" and "kernel-debuginfo" packages for your kernel actually being used?
See the following documentation for more details:
https://www.sourceware.org/systemtap/SystemTap_Beginners_Guide/using-systemtap.html#install-kinfo
Regards, -agentzh
I'm using 2.8.17 redis server. Send requests throw unix.sock by nginx+lua. Each request is one SMEMBERS command. Now I have about 1k req per sec. In the nginx access log I see mostly next
You can see that request time is not more than 2ms. But almost every second I have a few requests with request time about 100ms
I can't find the problem place. I checked everything from the latency guide Slowlog has no entries System is not swapping I don't use AOF
I read a lot of posts on stackoverflow like this http://stackoverflow.com/questions/16841469/redis-performance-tuning/23719773#23719773 and I can't find the solution.
redis-benchmark gives me next