karlseguin / ccache

A golang LRU Cache for high concurrency
MIT License
1.28k stars 119 forks source link

does ccache comes with write lock? #25

Closed unisqu closed 5 years ago

unisqu commented 5 years ago

i have some strange experience using layered cache, maybe it's my code, maybe it's fixed here...?

https://github.com/karlseguin/ccache/issues/2

does layered cache comes with write lock?

karlseguin commented 5 years ago

The issue that you referenced had 2 fixes. One was already applied to the LayeredCache, but the other wasn't. item.promotion is now protected in the layeredcache like it is in the main cache 692cd618b2640c062a8a8ef296d5bcf6dd3d5553

Not sure what your issue was, so not sure if this will fix your actual problem though.

If you're still having issues, maybe you can describe it or provide code to reproduce it?

unisqu commented 5 years ago

"item falling out of scope", what does that mean? let's say the next nanosecond it's garbage collected...

anyhow, can you provide the code for testing read/write contention locking test? e.g. saving a large file while reading it.

karlseguin commented 5 years ago

"item falling out of scope" was with respect to #23. When the Cache's gc logic runs, it doesn't actually free the memory, it merely removes the cache's reference to it (as far as I know, there's no way to force gc of a specific memory in Go). Removing the cache's reference allows the real GC to clean up the memory. But Go's GC won't release the memory if something else is referencing it, and in your example, that something else is the item variable.

Imagine data held by the cache (with no other code running)

 ----------      ----------      --------      --------
|  ccache  | -> |  bucket  | -> |  item  | -> |  DATA  |
 ----------      ----------      --------      --------

When ccache's "gc" runs, item becomes abandoned. In this specific case, because nothing else references item, Go's GC can free the memory.

 ----------      ----------      --------      --------
|  ccache  | -> |  bucket  |    |  item  | -> |  DATA  |
 ----------      ----------      --------      -------- 
                                ===== can be freed ====

In YOUR code, it looks more like:

 ----------      ----------      --------      --------
|  ccache  | -> |  bucket  | -> |  item  | -> |  DATA  |
 ----------      ----------      --------      --------
                                     ^
                                     |
                                  ------- 
                                 |  var  |
                                  -------

So it doesn't matter if ccache's GC removes its reference to item, because varholds a reference to it. So Go's garbabe collector won't free your data until var is out of scope.

                                 === cannot be freed ===
 ----------      ----------      --------      --------
|  ccache  | -> |  bucket  |    |  item  | -> |  DATA  |
 ----------      ----------      --------      --------
                                     ^   
                                     |  
                                  ------- 
                                 |  var  |
                                  -------

Consider this pseudocode:

if cache.Get("somekey") != nil {
  value = cache.Get("somekey")
}

THIS code CAN cause issues since, as you say, the GC could free the data between the two calls to Get.

However, this code DOES NOT have the same problem:

v := cache.Get("somekey")
if v != nil {
  ...
}

Because once v references the data, the GC won't free it. The cache might evict it, which means a subsequent call to Get would return nil, but that won't impact v.

karlseguin commented 5 years ago

As for concurrency with large data. It doesn't matter if data is small or large. Either way, it's just a reference (it's unlikely that you're storing large stack-allocated values in the cache).

Go's race detector is probably the best thing to use to make sure there's no concurrency issue..Random testing isn't likely to catch issues that can happen with the very short lived locks that are used.

The one area that might be problematic is using the Fetch function (or your own). There's no built-in protection for the "thundering herd" problem. So if you have:

cache.Fetch(someKey, time.Minute * 10, func() (interface{}, error) {
  // THIS CODE IS REALLY SLOW
})

and you call the above concurrently with the same someKey, each goroutine will execute your callback function. This is up to the application to deal with, but using something like Singleflight would be reasonable: https://godoc.org/golang.org/x/sync/singleflight

unisqu commented 5 years ago

This is a fantastic explanation. Please put into your readme main. Thanks