knyar / nginx-lua-prometheus

Prometheus metric library for Nginx written in Lua
MIT License
1.46k stars 231 forks source link

counter not initialied while logging request #87

Closed Videl closed 4 years ago

Videl commented 4 years ago

Hello,

I don't know what to do with this issue, could you please give me pointers?

Here's what my config has in the http block:

   lua_package_path '/usr/local/share/lua/5.3/?.lua;/usr/local/share/lua/5.3/resty/?.lua;/usr/local/share/lua/5.3/nginx/?.lua;;';
    lua_shared_dict prometheus_metrics 10M; # Prometheus
    lua_shared_dict discovery 1m; # OAuth2
    lua_shared_dict jwks 1m; # OAuth2
    lua_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;

    # enable capturing error logs
    lua_capture_error_log 32m;

    init_by_lua '
        prometheus = require("prometheus").init("prometheus_metrics")
        metric_requests = prometheus:counter(
            "nginx_http_requests_total", "Number of HTTP requests", {"host", "status"})
        metric_latency = prometheus:histogram(
            "nginx_http_request_duration_seconds", "HTTP request latency", {"host"})
        metric_connections = prometheus:gauge(
            "nginx_http_connections", "Number of HTTP connections", {"state"})
    ';

    init_worker_by_lua_block {
        prometheus:init_worker()
    }

    log_by_lua '
        metric_requests:inc(1, {ngx.var.server_name, ngx.var.status})
        metric_latency:observe(tonumber(ngx.var.request_time), {ngx.var.server_name})
    ';

I also have a server block for url "/nginx_metrics" but the issue isn't there I believe?

The error is: 2020/05/03 14:22:24 [error] 22136#22136: *347 [lua] prometheus.lua:671: log_error(): counter not initialied while logging request, client:....

Looking at the code, it seems it is init_worker() that initializes the counter, and it seems to have been done, I made sure that the init_worker_by_lua_block was ran by adding some logs there.

I don't know what to try to make it work. I'm on version 1.15.8.1 of openresty.

knyar commented 4 years ago

Could you please reload nginx (or restart it) and check whether there are any lua-related error messages in the log? Perhaps for some reason prometheus_resty_counter.lua is not available?

Videl commented 4 years ago

Indeed I believe it is prometheus_resty_counter that is not available, but I don't see any logs anywhere. I had to add logs in the init_worker function to see that it is really used

Videl commented 4 years ago

I added the prometheus_resty_counter.lua file next to the prometheus.lua file, in the folder /usr/local/share/lua/5.3/, as per the config in nginx: lua_package_path '/usr/local/share/lua/5.3/?.lua;...'

Seems that the following code:

function Prometheus:init_worker(sync_interval)
  self.sync_interval = sync_interval or 1
  ngx.log(ngx.WARN, "init_worker with sync_interval:", self.sync_interval)
  local counter_instance, err = resty_counter_lib.new(self.dict_name, self.sync_interval)
  ngx.log(ngx.WARN, "err variable in init_worker:", err)
  if err then
    error(err, 2)
    ngx.log(err)
  end
  ngx.log(ngx.WARN, counter_instance)

  self._counter = counter_instance
end

produces those logs:

2020/05/03 17:08:08 [error] 28368#28368: init_worker_by_lua error: /usr/local/share/lua/5.3/prometheus.lua:509: bad argument #1 to 'log' (expected table to have __tostring metamethod)
stack traceback:
        [C]: in function 'log'
        /usr/local/share/lua/5.3/prometheus.lua:509: in function 'init_worker'
        init_worker_by_lua:2: in main chunk
2020/05/03 17:08:08 [error] 28368#28368: init_worker_by_lua error: /usr/local/share/lua/5.3/prometheus.lua:509: bad argument #1 to 'log' (expected table to have __tostring metamethod)
stack traceback:
        [C]: in function 'log'
        /usr/local/share/lua/5.3/prometheus.lua:509: in function 'init_worker'
        init_worker_by_lua:2: in main chunk

Initialization seems fine? (I removed the log of counter_instance to test)

Videl commented 4 years ago

I think I have a more helpful log there. I edited prometheus.lua to give the line where it logged. I have this log:

2020/05/03 17:18:13 [warn] 31990#31990: *1 [lua] prometheus.lua:528: init_worker(): init_worker with sync_interval:1, context: init_worker_by_lua*
2020/05/03 17:18:13 [warn] 31990#31990: *1 [lua] prometheus.lua:528: init_worker(): init_worker with sync_interval:1, context: init_worker_by_lua*
2020/05/03 17:18:13 [warn] 31990#31990: *1 [lua] prometheus.lua:530: init_worker(): err variable in init_worker:nil, context: init_worker_by_lua*
2020/05/03 17:18:13 [warn] 31990#31990: *1 [lua] prometheus.lua:530: init_worker(): err variable in init_worker:nil, context: init_worker_by_lua*
2020/05/03 17:18:18 [warn] 31990#31990: *5 [lua] prometheus.lua:315: inc(): not c! while logging request, client: ...
2020/05/03 17:18:18 [warn] 31990#31990: *5 [lua] prometheus.lua:315: inc(): not c! while logging request, client: ...
2020/05/03 17:18:18 [warn] 31990#31990: *5 [lua] prometheus.lua:316: inc(): nil while logging request, client: ...
2020/05/03 17:18:18 [warn] 31990#31990: *5 [lua] prometheus.lua:316: inc(): nil while logging request, client: ...
2020/05/03 17:20:24 [error] 31990#31990: *152 [lua] prometheus.lua:707: log_error(): counter not initialied while logging request, client: ...

For the following code:

294 local function inc_counter(self, value, label_values)
295   -- counter is not allowed to decrease
296   if value and value < 0 then
297     self._log_error_kv(self.name, value, "Value should not be negative")
298     return
299   end
300
301   local k, err
302   k, err = lookup_or_create(self, label_values)
303   if err then
304     self._log_error(err)
305 ngx.log(ngx.WARN, err)
306
307     return
308   end
309
310   local c = self._counter
311   if not c then
312     c = self.parent._counter
313     if not c then
314
315 ngx.log(ngx.WARN, "not c!")
316 ngx.log(ngx.WARN, err)
317       self._log_error(ERR_MSG_COUNTER_NOT_INITIALIZED)
318
319       return
320     end
321     self._counter = c
322   end
323   c:incr(k, value)
324 end

Any idea why the counter is empty?

Videl commented 4 years ago

Oh boy, you were right, I had a warning: May 03 11:54:24 kim nginx[6815]: nginx: [alert] lua_code_cache is off; this will hurt performance in...

I set it "on" and it now works! I should have taken care of this log beforehand.. I'll read some docs about it. I just have a question, is it mandatory to call a method for logging? instead of ngx.log? It was quite useful to see the line number in the log error

Videl commented 4 years ago

Thank for your tip... :)

knyar commented 4 years ago

I just have a question, is it mandatory to call a method for logging? instead of ngx.log? It was quite useful to see the line number in the log error

Centralized error logging allows exporting an error metric to Prometheus. I can see how that might make debugging more difficult, sorry about that.

I will add a note about lua_code_cache into the README file.