vozlt / nginx-module-vts

Nginx virtual host traffic status module
BSD 2-Clause "Simplified" License
3.23k stars 463 forks source link

Gibberish and/or memory leaks in the ::nogroups upstream list #184

Open zajdee opened 4 years ago

zajdee commented 4 years ago

Hi all,

considering the last commit was some 2 years ago, this issue is logged just to keep the evidence for others potentially seeing the same issue.

From time to time we are seeing weird output when fetching data from the VTS module using the VTS exporter - both Prometheus and regular (JSON) data in the ::nogroups upstream list. The weirdness means it's either a gibberish (binary data) or a weird upstream name, which looks like some data copied over from a HTTP request.

Snímek obrazovky 2020-06-15 v 10 58 14

It is unclear if this is some bug (memory leak?) in the VTS module or in nginx itself. A similar (same?) issue has been previously briefly mentioned by @dkrutsko in https://github.com/vozlt/nginx-module-vts/issues/170#issuecomment-547081336

nginx config (simplified):

upstream cluster {
    server <hostname1> resolve weight=300 fail_timeout=1 max_fails=1;
    server <hostname2> resolve weight=1 fail_timeout=1 max_fails=1;
    server <backup-hostname> backup;

    keepalive 10;
}

server {
  # Listening on local unix socket only to improve the speed and stability
  listen unix:/run/nginx-redirect_resolver.sock;

  server_name _redirectresolver;
  include /etc/nginx/conf.d/resolver.conf;

  # Proxy buffering off to avoid having each file twice in memory
  proxy_buffering off;

  access_log /var/log/nginx/redirect-resolver.log redir_resolver;
  error_log /var/log/nginx/error.log;
  set_real_ip_from    "unix:";
  real_ip_header      X-Forwarded-For;

  location @rewrite304 {
    return 304;
  }

  location @rewrite400 {
    return 400;
  }

  location @rewrite403 {
    return 403;
  }

  location @rewrite404 {
    return 404;
  }

  location @rewrite502 {
    return 502;
  }

  location @redir_last {
    # 304/400/403/404 are handled internally on nginx
    # by sending empty response with the respective code
    proxy_redirect      off;
    proxy_intercept_errors on;

    error_page 304 =304 @rewrite304;
    error_page 400 =400 @rewrite400;
    error_page 403 =403 @rewrite403;
    error_page 404 =404 @rewrite404;
    # any unexpected upstream 3xx/4xx/5xx return code gets translated to 502
    error_page 300 301 302 303 305 306 307 308 =502 @rewrite502;
    error_page 401 402 405 406 407 408 409 410 411 412 413 414 415 416 417 418 421 422 423 424 426 428 429 431 451 =502 @rewrite502;
    error_page 500 501 502 503 504 505 506 507 508 510 511 =502 @rewrite502;

    set $storage $upstream_http_location;
    proxy_pass $storage;
    proxy_hide_header 'Access-Control-Allow-Origin';
    add_header 'Access-Control-Allow-Origin' '*' always;
  }

  location @redir_non_fatal {
    proxy_redirect         off;
    proxy_intercept_errors on;
    recursive_error_pages  on;
    # any unexpected upstream 3xx/4xx/5xx return code getspointed to @origin
    error_page 304 =304 @rewrite304;
    error_page 300 301 302 303 305 306 307 308 = @origin;
    error_page 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 420 422 423 424 426 428 429 431 444 449 450 451 = @origin;
    error_page 500 501 502 503 504 505 506 507 508 510 511 = @origin;

    set $storage_first $upstream_http_location;
    proxy_pass $storage_first;
    proxy_hide_header 'Access-Control-Allow-Origin';
    add_header 'Access-Control-Allow-Origin' '*' always;
  }

  # This location is fallback option
  # - we try to get file on file redirector (sends HTTP 301/302 if file is found)
  # - if we get redirect, we process it with @redir_last
  location @origin {
    set $backend http://<backup-hostname>;
    proxy_pass $backend$uri$is_args$args;
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    proxy_intercept_errors on;
    proxy_redirect off;
    error_page 301 302 307 = @redir_last;
  }

  location / {
    recursive_error_pages on;
    set $backend http://cluster;
    proxy_pass $backend$uri$is_args$args;
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    proxy_intercept_errors on;
    proxy_redirect off;
    error_page 301 302 307 = @redir_non_fatal;
    error_page 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 420 422 423 424 426 428 429 431 444 449 450 451 = @origin;
    error_page 500 501 502 503 504 505 506 507 508 510 511 = @origin;
  }
}

Our /etc/nginx/sites-enabled/status_vts:

server {
        listen 127.0.0.1:8080;
        listen [::1]:8080;
        location = /status/format/json {
                vhost_traffic_status_bypass_stats on;
                vhost_traffic_status_display;
                vhost_traffic_status_filter_by_host on;
                vhost_traffic_status_filter on;
                vhost_traffic_status_display_format json;
                access_log off;
                allow 127.0.0.1;
                allow ::1;
                deny all;
        }
        location = /status {
                vhost_traffic_status_bypass_stats on;
                vhost_traffic_status_display;
                vhost_traffic_status_filter_by_host on;
                vhost_traffic_status_filter on;
                vhost_traffic_status_display_format html;
                access_log off;
                allow 127.0.0.1;
                allow ::1;
                deny all;
        }
        location = /metrics {
                vhost_traffic_status_bypass_stats on;
                vhost_traffic_status_display;
                vhost_traffic_status_filter_by_host on;
                vhost_traffic_status_filter on;
                vhost_traffic_status_display_format prometheus;
                access_log off;
                allow 127.0.0.1;
                allow ::1;
                deny all;
        }
}

We also have a regular /etc/nginx/sites-enabled/status module enabled:

server {
    listen 127.0.0.1:800;
    listen [::1]:800;
    location = /status {
        stub_status on;
        access_log off;
        allow 127.0.0.1;
        allow ::1;
        deny all;
    }
    location = /status_hc {
        # The HC stands for health check, not Hard Core
        allow 127.0.0.1;
        allow ::1;
        deny all;
        default_type text/plain;
        content_by_lua_block {
            local hc = require "resty.upstream.healthcheck"
            ngx.say("Nginx Worker PID: ", ngx.worker.pid())
            ngx.print(hc.status_page())
        }
    }
}

If I run curl http://localhost:800/status_hc there is no gibberish, no extra upstream. But when running either curl -s http://localhost:8080/status/format/json or curl -s http://localhost:8080/metrics, the gibberish is there:

nginx_vts_upstream_bytes_total{upstream="::nogroups",backend="p͍�XU���XU@s�",direction="in"} 626
nginx_vts_upstream_bytes_total{upstream="::nogroups",backend="p͍�XU���XU@s�",direction="out"} 34317
nginx_vts_upstream_requests_total{upstream="::nogroups",backend="p͍�XU���XU@s�",code="1xx"} 0
nginx_vts_upstream_requests_total{upstream="::nogroups",backend="p͍�XU���XU@s�",code="2xx"} 1
nginx_vts_upstream_requests_total{upstream="::nogroups",backend="p͍�XU���XU@s�",code="3xx"} 0
nginx_vts_upstream_requests_total{upstream="::nogroups",backend="p͍�XU���XU@s�",code="4xx"} 0
nginx_vts_upstream_requests_total{upstream="::nogroups",backend="p͍�XU���XU@s�",code="5xx"} 0
nginx_vts_upstream_requests_total{upstream="::nogroups",backend="p͍�XU���XU@s�",code="total"} 1
nginx_vts_upstream_request_seconds_total{upstream="::nogroups",backend="p͍�XU���XU@s�"} 20.417
nginx_vts_upstream_request_seconds{upstream="::nogroups",backend="p͍�XU���XU@s�"} 0.000
nginx_vts_upstream_response_seconds_total{upstream="::nogroups",backend="p͍�XU���XU@s�"} 20.415
nginx_vts_upstream_response_seconds{upstream="::nogroups",backend="p͍�XU���XU@s�"} 0.000
      {
        "server": "p͍�XU\u0000\u0000���XU\u0000\u0000@\u0000s�",
        "requestCounter": 1,
        "inBytes": 626,
        "outBytes": 34317,
        "responses": {
          "1xx": 0,
          "2xx": 1,
          "3xx": 0,
          "4xx": 0,
          "5xx": 0
        },
(...)
        "weight": 0,
        "maxFails": 0,
        "failTimeout": 0,
        "backup": false,
        "down": false,
        "overCounts": {
          "maxIntegerSize": 18446744073709552000,
          "requestCounter": 0,
          "inBytes": 0,
          "outBytes": 0,
          "1xx": 0,
          "2xx": 0,
          "3xx": 0,
          "4xx": 0,
          "5xx": 0,
          "requestMsecCounter": 0,
          "responseMsecCounter": 0
        }
      },

The JSON output fed through hexdump looks like:

0001f210  20 20 20 22 73 65 72 76  65 72 22 3a 20 22 70 cd  |   "server": "p.|
0001f220  8d ef bf bd 58 55 5c 75  30 30 30 30 5c 75 30 30  |....XU\u0000\u00|
0001f230  30 30 ef bf bd ef bf bd  ef bf bd 58 55 5c 75 30  |00.........XU\u0|
0001f240  30 30 30 5c 75 30 30 30  30 40 5c 75 30 30 30 30  |000\u0000@\u0000|
0001f250  73 ef bf bd 22 2c 0a 20  20 20 20 20 20 20 20 22  |s...",.        "|
0001f260  72 65 71 75 65 73 74 43  6f 75 6e 74 65 72 22 3a  |requestCounter":|
0001f270  20 31 2c 0a 20 20 20 20  20 20 20 20 22 69 6e 42  | 1,.        "inB|
0001f280  79 74 65 73 22 3a 20 36  32 36 2c 0a 20 20 20 20  |ytes": 626,.    |
0001f290  20 20 20 20 22 6f 75 74  42 79 74 65 73 22 3a 20  |    "outBytes": |
0001f2a0  33 34 33 31 37 2c 0a 20  20 20 20 20 20 20 20 22  |34317,.        "|
0001f2b0  72 65 73 70 6f 6e 73 65  73 22 3a 20 7b 0a 20 20  |responses": {.  |
0001f2c0  20 20 20 20 20 20 20 20  22 31 78 78 22 3a 20 30  |        "1xx": 0|
0001f2d0  2c 0a 20 20 20 20 20 20  20 20 20 20 22 32 78 78  |,.          "2xx|
0001f2e0  22 3a 20 31 2c 0a 20 20  20 20 20 20 20 20 20 20  |": 1,.          |
0001f2f0  22 33 78 78 22 3a 20 30  2c 0a 20 20 20 20 20 20  |"3xx": 0,.      |
0001f300  20 20 20 20 22 34 78 78  22 3a 20 30 2c 0a 20 20  |    "4xx": 0,.  |
0001f310  20 20 20 20 20 20 20 20  22 35 78 78 22 3a 20 30  |        "5xx": 0|
0001f320  0a 20 20 20 20 20 20 20  20 7d 2c 0a 20 20 20 20  |.        },.    |
0001f330  20 20 20 20 22 72 65 71  75 65 73 74 4d 73 65 63  |    "requestMsec|
vozlt commented 3 years ago

@zajdee Thanks for reporting. The resolve option in upstream block is being used, are you using nginx commercial version? Or, please let me know the 3rd party module(resolve) you are using. The option is probably suspicious, so I'll try to reproduce it.