Closed xordspar0 closed 4 years ago
the counter updates are not mutex'd, so that's expected
you can either lock around the updates, or split them between alloc
and free
and let the user compute the actual value, guarding against "underflows" (more free
than alloc
)
I think I'd prefer locking around the updates. I'm also seeing this happen with MAIN.n_object in both 6.0.4 and 6.4.0, so possibly the same issue?
same behavior, wouldn't call it an issue has it was designed that way, but yes.
be careful with locking as it will impact performance
Updates do happen under vmod_xkey's mutex for tree operations, so maybe they aren't balanced?
At first glance though, they seem to be.
We had discussed this on IRC a bit. MAIN.n_object
is correct, its just that the worker operations hit the sumfunction
out of order. So underflowed readings are possible and the counts will return to normal after everything is idle.
I was looking at the wrong gauge, and it looks like g_ochead_bytes
is doing something funky with regards to pooling, but it looks otherwise balanced (at first glance).
I did a little more investigation of the symptoms, but I haven't dug into the code yet. This graph appears to show the g_bytes metric is decreasing more than it is increasing.
Glancing at the code, you are accounting differently here by skipping the pool allocation:
Everywhere else you do accounting including the pools. This would definitely cause an underflow.
I just took a quick look at our metrics and oc_bytes is definitely the one that's underflowing. We'll try to figure out how to write a test for this and make a PR.
Over time while Varnish is running, the XKEY.g_ochead_bytes counter sometimes underflows.
In this chart, the far-right value is 18446744073708882000, which should be -669616 except that the counter is an unsigned 64-bit integer.
Note that g_ochead_bytes is a DEBUG level counter. To see it in varnish stat you need to press
v
a couple times.