can not check upstream response time metrics

abhishek210394 commented 7 years ago

I can not see upstream response time metrics with any version of nginx. I have the stub_status set up correctly and the log file permissions are just fine. I can see all other metrics except upstream response time. I have formatted logs as given in the documentation as follows:

log_format  main_ext  '$remote_addr - $remote_user [$time_local] "$request" '
                  '$status $body_bytes_sent "$http_referer" '
                  '"$http_user_agent" "$http_x_forwarded_for" '
                  '"$host" sn="$server_name" '
                  'rt=$request_time '
                  'ua="$upstream_addr" us="$upstream_status" '
                  'ut="$upstream_response_time" ul="$upstream_response_length" '
                  'cs=$upstream_cache_status' ;

access_log  /var/log/nginx/access.log  main_ext;

I tried this on nginx versions - 1.10.0 and 1.4.6 I am running nginx as a load balancer on nginx 1.10.0 which routes requests to another server running nginx 1.4.6 and I can not see upstream response time for both of these in graphs.

centminmod commented 7 years ago

yeah same here even with main_ext log format on source compiled nginx 1.13.1 + centos 7.3

nginx -V nginx version: nginx/1.13.1 built by gcc 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC) built with OpenSSL 1.0.2l 25 May 2017 TLS SNI support enabled configure arguments: --with-ld-opt='-ljemalloc -lpcre -Wl,-z,relro -Wl,-rpath,/usr/local/lib' --with-cc-opt='-m64 -march=native -DTCP_FASTOPEN=23 -g -O3 -fstack-protector-strong -fuse-ld=gold --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -gsplit-dwarf' --sbin-path=/usr/local/sbin/nginx --conf-path=/usr/local/nginx/conf/nginx.conf --with-compat --with-http_stub_status_module --with-http_secure_link_module --with-libatomic --with-http_gzip_static_module --add-dynamic-module=../ngx_pagespeed-1.12.34.2-beta --with-http_sub_module --with-http_addition_module --with-http_image_filter_module=dynamic --with-http_geoip_module --with-stream_geoip_module --with-stream_realip_module --with-stream_ssl_preread_module --with-threads --with-stream=dynamic --with-stream_ssl_module --with-http_realip_module --add-dynamic-module=../ngx-fancyindex-0.4.0 --add-module=../ngx_cache_purge-2.3 --add-module=../ngx_devel_kit-0.3.0 --add-module=../set-misc-nginx-module-0.31 --add-module=../echo-nginx-module-0.60 --add-module=../redis2-nginx-module-0.14 --add-module=../ngx_http_redis-0.3.7 --add-dynamic-module=../lua-nginx-module-0.10.9rc5 --add-module=../memc-nginx-module-0.18 --add-module=../srcache-nginx-module-0.31 --add-module=../headers-more-nginx-module-0.32 --with-pcre=../pcre-8.40 --with-pcre-jit --with-zlib=../zlib-1.2.11 --with-http_ssl_module --with-http_v2_module --add-module=../http2-push-nginx-module --with-openssl=../openssl-1.0.2l

IWriteThings commented 7 years ago

Having the same issue using the default instructions - same settings used by @abhishek-plackal

Any idea on a fix for this?

gshulegaard commented 7 years ago

It is hard to say, but as a first step to help diagnose if you could enable debug logging in your agent configuration and send me the debug log after running the agent for a little while I can take a look to see if anything obvious stands out.

Also, a couple sample lines from your nginx log during the run time of this debug log would also be useful for initial diagnostic.

dedm commented 7 years ago

@abhishek-plackal we transformed your question to a support ticket but you've never replied.

@IWriteThings, @centminmod - as @gshulegaard said - please enable debug logging and share some lines from your access logs. That will really help us to investigate the problem.

IWriteThings commented 7 years ago

@dedm & @gshulegaard

gist deleted

Please let me know if this helps. The nginx logs are at the bottom. There is nothing but a long list of hits from nginx-amplify-agent.

IWriteThings commented 7 years ago

I suspect the issue is that the custom log example provided by default does not include Variable: $upstream_connect_time and therefore, the agent isn't going to report that data.

What variable should I assign Variable: $upstream_connect_time to in the log? I tried uc="$upstream_connect_time" but it still doesn't seem to be working. Any ideas?

IWriteThings commented 7 years ago

1670 127.0.0.1 - - [22/Jun/2017:15:39:37 +0000] "GET /nginx_status HTTP/1.1" 200 116 "-" "nginx-amplify-agent/0.44-2" "-" "127.0.0.1" sn="127.0.0.1" rt=0.000 ua="-" us="-" ut="-" ul="-" cs=- 1671 127.0.0.1 - - [22/Jun/2017:15:39:57 +0000] "GET /nginx_status HTTP/1.1" 200 116 "-" "nginx-amplify-agent/0.44-2" "-" "127.0.0.1" sn="127.0.0.1" rt=0.000 ua="-" us="-" ut="-" ul="-" cs=- 1672 127.0.0.1 - - [22/Jun/2017:15:40:17 +0000] "GET /nginx_status HTTP/1.1" 200 116 "-" "nginx-amplify-agent/0.44-2" "-" "127.0.0.1" sn="127.0.0.1" rt=0.000 ua="-" us="-" ut="-" ul="-" uc="-" cs="-" 1673 127.0.0.1 - - [22/Jun/2017:15:40:20 +0000] "GET /nginx_status HTTP/1.1" 200 115 "-" "nginx-amplify-agent/0.44-2" "-" "127.0.0.1" sn="127.0.0.1" rt=0.000 ua="-" us="-" ut="-" ul="-" uc="-" cs="-" 1674 127.0.0.1 - - [22/Jun/2017:15:40:20 +0000] "GET /nginx_status HTTP/1.1" 200 116 "-" "nginx-amplify-agent/0.44-2" "-" "127.0.0.1" sn="127.0.0.1" rt=0.000 ua="-" us="-" ut="-" ul="-" uc="-" cs="-" 1675 127.0.0.1 - - [22/Jun/2017:15:40:33 +0000] "GET /nginx_status HTTP/1.1" 200 116 "-" "nginx-amplify-agent/0.44-2" "-" "127.0.0.1" sn="127.0.0.1" rt=0.000 ua="-" us="-" ut="-" ul="-" uc="-" cs="-" 1676 127.0.0.1 - - [22/Jun/2017:15:40:33 +0000] "GET /nginx_status HTTP/1.1" 200 115 "-" "nginx-amplify-agent/0.44-2" "-" "127.0.0.1" sn="127.0.0.1" rt=0.000 ua="-" us="-" ut="-" ul="-" uc="-" cs="-" 1677 127.0.0.1 - - [22/Jun/2017:15:40:53 +0000] "GET /nginx_status HTTP/1.1" 200 115 "-" "nginx-amplify-agent/0.44-2" "-" "127.0.0.1" sn="127.0.0.1" rt=0.000 ua="-" us="-" ut="-" ul="-" uc="-" cs="-" 1678 127.0.0.1 - - [22/Jun/2017:15:41:13 +0000] "GET /nginx_status HTTP/1.1" 200 115 "-" "nginx-amplify-agent/0.44-2" "-" "127.0.0.1" sn="127.0.0.1" rt=0.000 ua="-" us="-" ut="-" ul="-" uc="-" cs="-" 1679 127.0.0.1 - - [22/Jun/2017:15:41:33 +0000] "GET /nginx_status HTTP/1.1" 200 115 "-" "nginx-amplify-agent/0.44-2" "-" "127.0.0.1" sn="127.0.0.1" rt=0.000 ua="-" us="-" ut="-" ul="-" uc="-" cs="-" 1680 127.0.0.1 - - [22/Jun/2017:15:41:53 +0000] "GET /nginx_status HTTP/1.1" 200 115 "-" "nginx-amplify-agent/0.44-2" "-" "127.0.0.1" sn="127.0.0.1" rt=0.000 ua="-" us="-" ut="-" ul="-" uc="-" cs="-" 1681 127.0.0.1 - - [22/Jun/2017:15:42:13 +0000] "GET /nginx_status HTTP/1.1" 200 115 "-" "nginx-amplify-agent/0.44-2" "-" "127.0.0.1" sn="127.0.0.1" rt=0.000 ua="-" us="-" ut="-" ul="-" uc="-" cs="-" 1682 127.0.0.1 - - [22/Jun/2017:15:42:33 +0000] "GET /nginx_status HTTP/1.1" 200 115 "-" "nginx-amplify-agent/0.44-2" "-" "127.0.0.1" sn="127.0.0.1" rt=0.000 ua="-" us="-" ut="-" ul="-" uc="-" cs="-" 1683 127.0.0.1 - - [22/Jun/2017:15:42:53 +0000] "GET /nginx_status HTTP/1.1" 200 115 "-" "nginx-amplify-agent/0.44-2" "-" "127.0.0.1" sn="127.0.0.1" rt=0.000 ua="-" us="-" ut="-" ul="-" uc="-" cs="-"

Those are the new logs with a small comparison to the old.

dedm commented 7 years ago

@IWriteThings in the log lines above: ut="-"

That means that upstream response time was not measured since it was not a request to any upstream. If you use nginx as a reverse proxy then you should get something like ut="0.333" when nginx actually served requests to upstreams.

IWriteThings commented 7 years ago

Since we do not use it as a reverse proxy, should I assume that the system will just continue to report Metrics Unavailable check the docs? I would think it could report 0 or none or something instead of giving a 'misconfiguration' warning?

Thoughts?

IWriteThings commented 7 years ago