patrickmn / go-cache

An in-memory key:value store/cache (similar to Memcached) library for Go, suitable for single-machine applications.
https://patrickmn.com/projects/go-cache/
MIT License
8.16k stars 874 forks source link

GetWithExpirationUpdate - atomic implementation #126

Closed paddlesteamer closed 4 years ago

paddlesteamer commented 4 years ago

This PR is fixed version of #96. Main changes are:

Supersedes #125

arp242 commented 4 years ago

I just did a quick test, but it looks like this change is quite a big performance regression:

benchmark                                               old ns/op     new ns/op     delta
BenchmarkCacheGetExpiring-2                             121           154           +27.27%
BenchmarkCacheGetNotExpiring-2                          35.6          71.1          +99.72%
BenchmarkRWMutexMapGet-2                                32.3          31.9          -1.24%
BenchmarkRWMutexInterfaceMapGetStruct-2                 82.2          75.7          -7.91%
BenchmarkRWMutexInterfaceMapGetString-2                 65.5          64.3          -1.83%
BenchmarkCacheGetConcurrentExpiring-2                   117           223           +90.60%
BenchmarkCacheGetConcurrentNotExpiring-2                85.5          167           +95.32%
BenchmarkRWMutexMapGetConcurrent-2                      71.9          55.7          -22.53%
BenchmarkCacheGetManyConcurrentExpiring-2               117           128           +9.40%
BenchmarkCacheGetManyConcurrentNotExpiring-2            102           161           +57.84%
BenchmarkCacheSetExpiring-2                             177           260           +46.89%
BenchmarkCacheSetNotExpiring-2                          75.3          160           +112.48%
BenchmarkRWMutexMapSet-2                                69.7          66.6          -4.45%
BenchmarkCacheSetDelete-2                               163           243           +49.08%
BenchmarkRWMutexMapSetDelete-2                          149           146           -2.01%
BenchmarkCacheSetDeleteSingleLock-2                     114           199           +74.56%
BenchmarkRWMutexMapSetDeleteSingleLock-2                101           99.4          -1.58%
BenchmarkIncrementInt-2                                 120           139           +15.83%
BenchmarkDeleteExpiredLoop-2                            2880788       5902806       +104.90%
BenchmarkShardedCacheGetExpiring-2                      154           180           +16.88%
BenchmarkShardedCacheGetNotExpiring-2                   57.0          89.8          +57.54%
BenchmarkShardedCacheGetManyConcurrentExpiring-2        88.0          99.4          +12.95%
BenchmarkShardedCacheGetManyConcurrentNotExpiring-2     36.5          54.7          +49.86%

The pointers also make the GC work harder, so there's probably some indirect performance regression there too (which may already be accounted for in the above benchmarks, didn't look in-depth).

A simpler way to fix this would be to just lock everything for the entire function instead of switching from RLock() to Lock(). This would also cause some performance regressions, but only if you call GetWithExpirationUpdate(), rather than everything (even if you don't use OnEvict).

paddlesteamer commented 4 years ago

Hi again,

I removed the defers I put in the previous commits and the benchmark results are better now. Can you run the benchmark tests again?

arp242 commented 4 years ago

defer got a big performance boost in Go 1.14; so it doesn't make that much of a difference any more. By the way, to be clear: I'm not the maintainer of this library, I've just been going through some PRs and issues to include in my fork. I already did some benchmarks on defer this morning, and it made very little difference.

I included new benchmarks at the bottom, as you can see it doesn't make that much of a difference.

The basic issue is that locks are not free, for example take this simple benchmark:

// BenchmarkMain-2         46259625                24.3 ns/op
func BenchmarkMain(b *testing.B) {
    var s struct{ sync.Mutex }
    for n := 0; n < b.N; n++ {
        s.Lock()
        s.Unlock()
    }
}

So that's the minimum amount of performance regression that can be expected for operations that add a new lock.

The other issue is that pointers are not free either; I included a benchmark below which changes just map[string]Item to pointers, and as you can see that causes quite a big regression too.

For reference, here's what I did in my fork:

// Touch replaces the expiry of a key and returns the current value.
func (c *cache) Touch(k string, d time.Duration) (interface{}, bool) {
    if d == DefaultExpiration {
        d = c.defaultExpiration
    }

    c.mu.Lock()
    defer c.mu.Unlock()

    item, ok := c.items[k]
    if !ok {
        return nil, false
    }

    item.Expiration = time.Now().Add(d).UnixNano()
    c.items[k] = item
    return item.Object, true
}

And the benchmark for this compared to your version:

// BenchmarkTouch-2         5750259               197 ns/op   My Touch
// BenchmarkTouch-2         4155210               284 ns/op   Your GetWithExpirationUpdate
func BenchmarkTouch(b *testing.B) {
    tc := New(DefaultExpiration, 0)
    d := 5 * time.Second
    tc.Set("foo", 0, DefaultExpiration)

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        tc.GetWithExpirationUpdate("foo", d)
    }
}

This is a simple benchmark of course; I tried to make a benchmark which made your version look better in cases where you're also doing other stuff in goroutines, for example:

// BenchmarkTouch-2         6014732               200 ns/op    My Touch
// BenchmarkTouch-2         4213928               284 ns/op    Your GetWithExpirationUpdate
func BenchmarkTouch(b *testing.B) {
    tc := New(DefaultExpiration, 0)
    d := 5 * time.Second
    tc.Set("foo", 0, DefaultExpiration)
    go func() {
        for {
            tc.Get("foo")
            time.Sleep(1 * time.Millisecond)
        }
    }()

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        tc.GetWithExpirationUpdate("foo", d)
    }
}

I tried a bunch of different variations of this, but I couldn't come up with anything where your version had better performance.

I don't know ... maybe I missed something – benchmarking realistic workloads of hard – but from what I can see just doing c.mu.Lock() for the duration of Touch() or GetWithExpirationUpdate() seems to be better in almost all cases?


For your latest commit, here's the comparison with the original PR:

benchmark                                               old ns/op     new ns/op     delta
BenchmarkCacheGetExpiring-2                             154           152           -1.30%
BenchmarkCacheGetNotExpiring-2                          71.1          68.1          -4.22%
BenchmarkRWMutexMapGet-2                                31.9          31.9          +0.00%
BenchmarkRWMutexInterfaceMapGetStruct-2                 75.7          76.0          +0.40%
BenchmarkRWMutexInterfaceMapGetString-2                 64.3          63.3          -1.56%
BenchmarkCacheGetConcurrentExpiring-2                   223           201           -9.87%
BenchmarkCacheGetConcurrentNotExpiring-2                167           183           +9.58%
BenchmarkRWMutexMapGetConcurrent-2                      55.7          59.0          +5.92%
BenchmarkCacheGetManyConcurrentExpiring-2               128           181           +41.41%
BenchmarkCacheGetManyConcurrentNotExpiring-2            161           153           -4.97%
BenchmarkCacheSetExpiring-2                             260           266           +2.31%
BenchmarkCacheSetNotExpiring-2                          160           163           +1.88%
BenchmarkRWMutexMapSet-2                                66.6          66.5          -0.15%
BenchmarkCacheSetDelete-2                               243           247           +1.65%
BenchmarkRWMutexMapSetDelete-2                          146           146           +0.00%
BenchmarkCacheSetDeleteSingleLock-2                     199           203           +2.01%
BenchmarkRWMutexMapSetDeleteSingleLock-2                99.4          99.2          -0.20%
BenchmarkIncrementInt-2                                 139           136           -2.16%
BenchmarkDeleteExpiredLoop-2                            5902806       5908243       +0.09%
BenchmarkShardedCacheGetExpiring-2                      180           188           +4.44%
BenchmarkShardedCacheGetNotExpiring-2                   89.8          84.0          -6.46%
BenchmarkShardedCacheGetManyConcurrentExpiring-2        99.4          100           +0.60%
BenchmarkShardedCacheGetManyConcurrentNotExpiring-2     54.7          51.2          -6.40%

And here's the comparison with master:

benchmark                                               old ns/op     new ns/op     delta
BenchmarkCacheGetExpiring-2                             121           152           +25.62%
BenchmarkCacheGetNotExpiring-2                          35.6          68.1          +91.29%
BenchmarkRWMutexMapGet-2                                32.3          31.9          -1.24%
BenchmarkRWMutexInterfaceMapGetStruct-2                 82.2          76.0          -7.54%
BenchmarkRWMutexInterfaceMapGetString-2                 65.5          63.3          -3.36%
BenchmarkCacheGetConcurrentExpiring-2                   117           201           +71.79%
BenchmarkCacheGetConcurrentNotExpiring-2                85.5          183           +114.04%
BenchmarkRWMutexMapGetConcurrent-2                      71.9          59.0          -17.94%
BenchmarkCacheGetManyConcurrentExpiring-2               117           181           +54.70%
BenchmarkCacheGetManyConcurrentNotExpiring-2            102           153           +50.00%
BenchmarkCacheSetExpiring-2                             177           266           +50.28%
BenchmarkCacheSetNotExpiring-2                          75.3          163           +116.47%
BenchmarkRWMutexMapSet-2                                69.7          66.5          -4.59%
BenchmarkCacheSetDelete-2                               163           247           +51.53%
BenchmarkRWMutexMapSetDelete-2                          149           146           -2.01%
BenchmarkCacheSetDeleteSingleLock-2                     114           203           +78.07%
BenchmarkRWMutexMapSetDeleteSingleLock-2                101           99.2          -1.78%
BenchmarkIncrementInt-2                                 120           136           +13.33%
BenchmarkDeleteExpiredLoop-2                            2880788       5908243       +105.09%
BenchmarkShardedCacheGetExpiring-2                      154           188           +22.08%
BenchmarkShardedCacheGetNotExpiring-2                   57.0          84.0          +47.37%
BenchmarkShardedCacheGetManyConcurrentExpiring-2        88.0          100           +13.64%
BenchmarkShardedCacheGetManyConcurrentNotExpiring-2     36.5          51.2          +40.27%

master vs. just changing map[string]Item to map[string]*Item:

benchmark                                               old ns/op     new ns/op     delta
BenchmarkCacheGetExpiring-2                             121           119           -1.65%
BenchmarkCacheGetNotExpiring-2                          35.6          34.3          -3.65%
BenchmarkRWMutexMapGet-2                                32.3          31.9          -1.24%
BenchmarkRWMutexInterfaceMapGetStruct-2                 82.2          77.9          -5.23%
BenchmarkRWMutexInterfaceMapGetString-2                 65.5          63.3          -3.36%
BenchmarkCacheGetConcurrentExpiring-2                   117           109           -6.84%
BenchmarkCacheGetConcurrentNotExpiring-2                85.5          101           +18.13%
BenchmarkRWMutexMapGetConcurrent-2                      71.9          58.6          -18.50%
BenchmarkCacheGetManyConcurrentExpiring-2               117           110           -5.98%
BenchmarkCacheGetManyConcurrentNotExpiring-2            102           109           +6.86%
BenchmarkCacheSetExpiring-2                             177           253           +42.94%
BenchmarkCacheSetNotExpiring-2                          75.3          150           +99.20%
BenchmarkRWMutexMapSet-2                                69.7          66.5          -4.59%
BenchmarkCacheSetDelete-2                               163           238           +46.01%
BenchmarkRWMutexMapSetDelete-2                          149           146           -2.01%
BenchmarkCacheSetDeleteSingleLock-2                     114           186           +63.16%
BenchmarkRWMutexMapSetDeleteSingleLock-2                101           99.2          -1.78%
BenchmarkIncrementInt-2                                 120           119           -0.83%
BenchmarkDeleteExpiredLoop-2                            2880788       5611693       +94.80%
BenchmarkShardedCacheGetExpiring-2                      154           157           +1.95%
BenchmarkShardedCacheGetNotExpiring-2                   57.0          54.9          -3.68%
BenchmarkShardedCacheGetManyConcurrentExpiring-2        88.0          87.3          -0.80%
BenchmarkShardedCacheGetManyConcurrentNotExpiring-2     36.5          35.9          -1.64%
paddlesteamer commented 4 years ago

Thank you for your long, detailed response. I just had time to do benchmarks myself and you're right, it is obvious that using a single mutex is faster. Converting the map value type to pointer was also a bad idea and cause regression.

So, I will switch to your version of Touch/GetWithExpirationUpdate in my fork and I'm closing this PR since your version is better. I know this repository has gone stale but I encourage you to open a PR for your Touch implementation since it may be useful for the ones looking for that functionality -even though it will probably never be merged.

Thanks again and have a good day!

arp242 commented 4 years ago

You're welcome @paddlesteamer; I mostly just wanted to know this for my own reasons :-)

I put up the fork over here the other day by the way: https://github.com/zgoat/zcache – I'm not 100% decided yet on some things I added so I didn't release a v1 yet, but all the existing stuff should be compatible, FYI :-)