use atomics for metrics

jellydator / ttlcache

An in-memory cache with item expiration and generics

MIT License

883 stars 115 forks source link

use atomics for metrics #91

Open aathan opened 1 year ago

aathan commented 1 year ago

AFAIK metricsMu is not really performing any useful function over and above the likely more efficient atomic integer operations available on most modern CPUs. In the Metric() return function, the metricsMu served to ensure all the counters are returned as of an instant in time when no other counter is incremented relative to other counters, but since the write locks are only held for the duration of an increment of a single element of the structure, even that has no utility (i.e., there are never increments of two counters at once while holding the lock across both increments, which is the only thing the read lock would protect). Finally, the atomic loads probably collapse to simple reads on many CPUs, with the atomic load possibly only forcing cpu cache coherency (i.e., you can probably just write Metrics(){return c.metrics} and be just fine).

swithek commented 1 year ago

I think you are right and this might be a good change. Before we merge this, have you done any benchmarking here? I am asking because of this bit:

likely more efficient atomic integer operations available on most modern CPUs

gaydin commented 1 year ago

I think you are right and this might be a good change. Before we merge this, have you done any benchmarking here? I am asking because of this bit:

likely more efficient atomic integer operations available on most modern CPUs


name                     old time/op    new time/op    delta
CacheSetWithoutTTL-8        290ns ± 0%     288ns ± 0%   ~     (p=1.000 n=1+1)
CacheSetWithGlobalTTL-8     516ns ± 0%     463ns ± 0%   ~     (p=1.000 n=1+1)

name                     old alloc/op   new alloc/op   delta
CacheSetWithoutTTL-8        99.0B ± 0%     97.0B ± 0%   ~     (p=1.000 n=1+1)
CacheSetWithGlobalTTL-8      256B ± 0%      253B ± 0%   ~     (p=1.000 n=1+1)

name                     old allocs/op  new allocs/op  delta
CacheSetWithoutTTL-8         2.00 ± 0%      2.00 ± 0%   ~     (all equal)
CacheSetWithGlobalTTL-8      4.00 ± 0%      4.00 ± 0%   ~     (all equal)

biosvs commented 1 year ago

I also noticed this potential improvement, glad to see opened PR for this (:

@swithek do you have any concerns regarding this improvement or regarding this particular PR?

swithek commented 1 year ago

I think you are right and this might be a good change. Before we merge this, have you done any benchmarking here? I am asking because of this bit:

likely more efficient atomic integer operations available on most modern CPUs

name                     old time/op    new time/op    delta
CacheSetWithoutTTL-8        290ns ± 0%     288ns ± 0%   ~     (p=1.000 n=1+1)
CacheSetWithGlobalTTL-8     516ns ± 0%     463ns ± 0%   ~     (p=1.000 n=1+1)

name                     old alloc/op   new alloc/op   delta
CacheSetWithoutTTL-8        99.0B ± 0%     97.0B ± 0%   ~     (p=1.000 n=1+1)
CacheSetWithGlobalTTL-8      256B ± 0%      253B ± 0%   ~     (p=1.000 n=1+1)

name                     old allocs/op  new allocs/op  delta
CacheSetWithoutTTL-8         2.00 ± 0%      2.00 ± 0%   ~     (all equal)
CacheSetWithGlobalTTL-8      4.00 ± 0%      4.00 ± 0%   ~     (all equal)

Having seen the benchmark results, I'm not entirely convinced this would be a good change. It doesn't look like an improvement at all. Perhaps the banchmark tests should be different to properly test this?