knyar / nginx-lua-prometheus

Prometheus metric library for Nginx written in Lua
MIT License
1.46k stars 231 forks source link

Fix getting key error when key not found #106

Closed timebug closed 4 years ago

timebug commented 4 years ago

Fix getting key error when key not found.

When I registers a counter and a histogram like this:

init_worker_by_lua_block {
  prometheus = require("prometheus").init("prometheus_metrics")
  metric_requests = prometheus:counter(
    "nginx_http_requests", "Number of HTTP requests", nil)
  metric_response_sizes = prometheus:histogram(
    "nginx_http_response_size_bytes", "Size of HTTP responses", nil,
    {10,100,1000,10000,100000,1000000})
}

metric_response_sizes covering a range from 10 byte to 1000000 byte as bucket boundaries, function lookup_or_create will added all the buckets to self._key_index when metric_response_sizes:observe be called at first time, if there is no one response size less than 10 byte, the related key was not created in self._dict. Then we call metric_requests:reset this kind of error log will appears.

  for _, key in ipairs(keys) do
    local value, key_err = self._dict:get(key)
    if value then
      -- without labels equal, or with labels and the part before { equals
      if key == self.name or name_prefix == string.sub(key, 1, name_prefix_length) then -- this check is after self._dict:get
        self._key_index:remove(key)
        local _, err = self._dict:safe_set(key, nil)
        if err then
          self._log_error("Error resetting '", key, "': ", err)
        end
      end
    else
      self._log_error("Error getting '", key, "': ", key_err) -- key_err may be nil when can not found key in self._dict
    end
  end
timebug commented 4 years ago

ping @knyar

knyar commented 4 years ago

Could you please provide a bit more details and describe how you are hitting this? Ideally key index and the metric dict should have the same keys.

Also note that you will need to adjust tests as well.

timebug commented 4 years ago

@knyar I added the description and fixed tests.

coveralls commented 4 years ago

Coverage Status

Coverage increased (+0.01%) to 94.213% when pulling 0479af05daffd8bdd7af7c81eba92c09785b27b4 on timebug:fix-get-key-error into 71ac9082637ff1269f195e4f4f5e61836782b8d0 on knyar:master.

knyar commented 4 years ago

This looks good. Thanks a lot for submitting a PR, providing more details and adjusting tests!