sasa1977 / con_cache

ets based key/value cache with row level isolated writes and ttl support
MIT License
910 stars 71 forks source link

Hit/miss ratio tracking #75

Closed aerosol closed 5 months ago

aerosol commented 6 months ago

Hey, thanks for this library.

Would you be interested in an extension allowing querying each cache's hit rate? Happy to discuss implementation details if you're open to the idea.

sasa1977 commented 6 months ago

Sorry for the late reply, I was a bit busy, and also wanted to think this through.

My first reaction was that this can be done by the client on top of the already existing interface. And while that is true, it's not completely obvious how, plus it requires some boilerplate, and it's possible to mess it up.

So I think it makes sense to implement this in con_cache. Do you have a proposal approach?

aerosol commented 6 months ago

Hey,

My first reaction was that this can be done by the client on top of the already existing interface.

That's what I did and it indeed wasn't very trivial.

So I think it makes sense to implement this in con_cache. Do you have a proposal approach?

I think emitting telemetry events for hit/miss would be enough for users to integrate as opt-in. WYT?

sasa1977 commented 6 months ago

I think emitting telemetry events for hit/miss would be enough for users to integrate as opt-in.

I like it. It should require a small amount of changes to con_cache. Plus it's very flexible.

Out of curiosity, what's your use-case for this and how does your implementation work? I wonder if we could also add a simple implementation, e.g. based on :counters.

aerosol commented 6 months ago

OK so the context is: we have several different caches in https://github.com/plausible/analytics/ - we like to understand utilization of each (expose charts via grafana/prometheus). The way it's currently implemented: there is an adapter that interfaces with ConCache and emits those events. The tricky bit was, with the high-level ConCache API, we weren't able to tell whether an item was fetched warm or cold from get_or_store.

I did consider atomic counters for this, but ended up using ets for now (see: https://github.com/plausible/analytics/blob/master/lib/plausible/cache/stats.ex) - while there might be gain in performance, this is currently fast enough and doesn't require us to keep track of refs, which is a must with :counters if I'm not mistaken.

I think providing a :counters-based stats aggregator (optional or not, up to you) is entirely doable, if you think the refs tracking is worth it.

aerosol commented 6 months ago

I forgot to add, historically it was Cachex before ConCache, hence the adapter dance - to keep the existing stuff relatively intact etc.

sasa1977 commented 6 months ago

The tricky bit was, with the high-level ConCache API, we weren't able to tell whether an item was fetched warm or cold from get_or_store.

I think you could use the high-level API, if in your own get_or_store wrapper you use ConCache.get and ConCache.dirty_put. But I agree this isn't immediately obvious.

this is currently fast enough and doesn't require us to keep track of refs,

Yeah, I think I see what you mean, good point!

we like to understand utilization of each (expose charts via grafana/prometheus)

How come you're using a global in-memory state for this? Wouldn't it make more sense to forward hit/miss events to the external metrics collector? That way you can aggregate data from multiple instances, and see moving averages over shorter time periods (e.g. minute), rather than the total absolute since the last deploy.

aerosol commented 6 months ago

How come you're using a global in-memory state for this? Wouldn't it make more sense to forward hit/miss events to the external metrics collector? That way you can aggregate data from multiple instances, and see moving averages over shorter time periods (e.g. minute), rather than the total absolute since the last deploy.

You're right, we're currently winding down everything to almost 0 as the app nodes rotate, this was never a concern though. I suppose this is very specific to our use case. The peak is always somewhat the same, greater precision would get us very little value at the cost of having to maintain yet another moving part (the external collector).

image

Anyway, would you like me to try and implement the telemetry events emission? With or without the stats collector?

sasa1977 commented 6 months ago

Anyway, would you like me to try and implement the telemetry events emission? With or without the stats collector?

Yeah, let's do this. Let's start with just telemetry events then, so basically the bare minimum that supports your use case.

aerosol commented 6 months ago

PR up, happy to discuss and re-iterate.

sasa1977 commented 5 months ago

closed via #76