karlseguin / ccache

A golang LRU Cache for high concurrency
MIT License
1.28k stars 119 forks source link

export cache size field #27

Closed lozaeric closed 5 years ago

lozaeric commented 5 years ago

Hi. I think cache size must be a public field. This would be useful for troubleshooting and monitoring.

https://github.com/karlseguin/ccache/blob/692cd618b2640c062a8a8ef296d5bcf6dd3d5553/cache.go#L14

karlseguin commented 5 years ago

I think the problem is that size is only ever accessed in the worker thread. So simply reading it from another thread is an undefined behaviour.

Could make it thread-safe by making all access go through atomic.XXX. But then the worker pays a performance penalty.

Could also achieve this via a channel. Add a GetSize() int function which sends a message via channel to the worker. But that might be relatively slow (relatively to what you expect getting the size would take) since the worker could be busy.

What do you think?

lozaeric commented 5 years ago

Yes, of course. I guess a "dirty read" is not that bad. In my use case, an approximate information would be useful.

Although, make it by using atomic pkg could be the best way. I think performance penalty would be minimal.

I’m working on improving my English. Please, let me know if you don't understand.

karlseguin commented 5 years ago

I understood everything perfectly :)

Without proper locking, the read wouldn't just be "dirty", the behavior would be undefined. Googling this, the general answer is: on most architectures, read on a word-sized integers is atomic, but you shouldn't rely on it. The worst case, but probably only theoretically at this point, is that you'd crash the entire operating system.

But even if we ignore this, there's no guarantees about of how "dirty" it can be. It's not crazy to think that the worker always runs on the same core/socket and since we can't flag the field as "volatile" other cores/sockets read extremely stale data from their caches.

What do you think of: https://github.com/karlseguin/ccache/commit/3385784411ac24a8be403f7938890ec67ef6e0d6 ??

Instead of exposing size, it exposes an ItemCount. Of course, ItemCount will only be equal to size when all the items in the cache are given a size of 1. But even when that isn't the case, I think knowing the number of items in the cache is still meaningful for debugging / monitoring.

The benefit of this approach is that there's 0 performance hit for anyone who doens't care about ItemCount. And, ItemCount is still pretty fast to calculate.

lozaeric commented 5 years ago

You're right. I thought stale read was the only issue but it's more complicated. Thak you for explaining

Yes, of course, ItemsCount is useful for monitoring / troubleshooting.

karlseguin commented 5 years ago

merged 3385784411ac24a8be403f7938890ec67ef6e0d6