Closed kingluo closed 2 years ago
Thank you for proposing this.
In general, I believe that the library should maintain data integrity and never lose metric values. If shdict gets full and lru eviction kicks in, it should be treated as an error: nginx_metric_errors_total
should be incremented, and error log messages should say that reported metric values might be incomplete.
safe_set
is used as an attempt to maintain data integrity, but I think you are absolutely right that eviction can also happen when a previously nonexistent dictionary key is incremented using :incr
. The fact that we do not detect evictions in such cases is a bug.
What I think would be a good way forward is to proceed with replacing safe_set
with set
, but also catch the forcible
return value every time we write to a shdict (using set
or incr
) and report errors when it becomes true
(which means that the dict is full).
I agree. Let me have a try.
@knyar Please review my commits and approve the github workflow, thanks.
Thank you! I am going to merge this into a separate branch to make a few follow-up adjustments before merging this to master
.
When the number of metrics increase, especially due to label value changes or add/remove labels on reload, it's likely to use up the shdict, when mostly we want to remove the old items in shdict making use of the lru cache mechanism of shdict. The old items are supposed to be ignored.
In fact, even we insist
safe_set
, theinc
we used in other places of codes may also purge old items. So why not make them consistent?