sasa1977 / con_cache

ets based key/value cache with row level isolated writes and ttl support
MIT License
910 stars 71 forks source link

Add size operation #12

Closed vorce closed 9 years ago

vorce commented 9 years ago

Not sure how needed this really is (since you could get it from the ets), but can be nice for convenience.

sasa1977 commented 9 years ago

Thanks for this. I actually removed size in cee7642a5ec7062dc08ca27c4fb2a88ad3260cd2, together with some other operations that can be done directly through :ets. In hindsight, it probably makes sense to have dedicated support for size, so I merged your pull.

I'm unsure what to do about the hex package. On one hand, this seems like a small change to publish a new version. On the other hand, I don't have any plans for con_cache atm, so it might end up being non-published for a long time. Do you have any preference here?

vorce commented 9 years ago

Thank you.

Nope, no preference for the hex package.

However I'd like to ask you what you think about adding size limitation support for con_cache? Ex: you initialize your cache with a max size option and if that limit was to be exceeded when puting the cache would throw away the oldest item. I might have a go at it if it's something you think makes sense to have (I think I'd like to have it at least).

sasa1977 commented 9 years ago

It's a very interesting feature, though the implementation might become very complicated, because we want to allow concurrent writes (otherwise the whole purpose of the cache is defeated), and limiting the size with concurrent writes seems tricky.

I need to think about how to do this.

One way is to use the same pattern as with TTLs, which means items are always stored, but the cache is then cleaned up in some regular intervals by a dedicated process. This means weak consistency (the limit might be exceeded), but still concurrent.

Another option is to try to do something with ets counters, which might bring strong consistency, but the implementation could become way more tricky.

If you have any suggestions yourself, let me know.

vorce commented 9 years ago

My initial thought on how to implement it was like the TTL pattern with weak consistency indeed. I'll let you know if I manage to get something working. Thanks again!

sasa1977 commented 9 years ago

Keep in mind there might be some subtleties.

One thing that immediately pops to mind is the TTL purging and it’s relationship to size-based purging.

I see two options here:

  1. Keep two strategies mutually exclusive.
  2. Do all the purging in the TTL process, so we can combine both strategies.

Option 1 is definitely simpler to implement, and probably to reason about. I’m also unsure whether auto-purging based on both time and size is useful. So I’m inclined to suggest just keeping them exclusive, at least in the initial implementation.

Another thing I wonder whether the size-based purger needs to work in ticks, or could it in fact delete a row immediately in the message handler. The more I think about it the more I lean towards immediate deletion in this case.

Thoughts?

On 20 Sep 2015, at 13:05, Joel C notifications@github.com wrote:

My initial thought on how to implement it was like the TTL pattern with weak consistency indeed. I'll let you know if I manage to get something working. Thanks again!

— Reply to this email directly or view it on GitHub https://github.com/sasa1977/con_cache/pull/12#issuecomment-141775527.

vorce commented 9 years ago

Hmm good points. I really don't know enough about the internals of con_cache yet so I'm very grateful for your insights. Initially I imagined both ttl and size based auto purging, but yeah as you mentioned it might not actually be that useful. If I were to take a stab at this it would probably be better to go the simplest route to start with anyway. :)

sasa1977 commented 9 years ago

My suggestion would be to start simple and create a plain module which maintains an ordered list of keys. I suspect it would export something like new/1, add/2, and delete/2.

Such module would be used just for keeping the state of the purger, without actually deleting anything. So for example, we can write:

purge_state = new ConCache.SizePurger(max_size: 20)

{new_purge_state, keys_to_delete} = SizePurger.add(purge_state, new_key)

Names could probably be better, I'm terrible at naming things :-)

Anyway, such structure could then be maintained in a GenServer, where you simply initialize it based on cache options. The server would handle two operations: add and delete. The add handler would update the structure, and delete purged keys from the cache table.

Once you have a GenServer in place, you would need to create it in a ConCache process structure (much like the ttl process), and send messages to it on each insert/update/delete.

Internally, I'd expect the purger structure to rely on :gb_trees, or maybe even :queue (because I expect these structures to work reasonably fast for ordered insertions), but if you don't feel comfortable, you could try with simple list until you have the working functionality, and then later optimize the implementation of the SizePurger (or however it's called).

Does that make any sense, and what do you think about this approach?

vorce commented 9 years ago

Cheers - super helpful stuff! Looks like a viable way to go about it at first glance. I'll get back to you with questions (as I'm sure I will have tons) when/if I dig into this more.

Appreciate all the help and your work!

sasa1977 commented 9 years ago

Sure, let me know if you get stuck somewhere.

IMO, the most of the work will be in making the structure, especially if you want to deal with updates, which should most probably move an item to the top of the queue.

The integration into ConCache might require some refactoring. Currently, the cache owner process does ttl based expiry. We probably want to make this polymorphic. There is a set_ttl call done from ConCache.Operations which basically ends up being handled in the owner process. This should probably be renamed, and then handled in the server by delegating to one module or another.

I suggest you develop the structure with interface similar to what I proposed. Then, we’ll refactor the Owner process, extract ttl handling behaviour into a separate module with the same interface, and make the handling polymorphic, most probably via protocols. I can take that later part myself if it looks like too much work to you, or you can do it under my guidance. Whatever you prefer :-)

On 20 Sep 2015, at 22:16, Joel C notifications@github.com wrote:

Cheers - super helpful stuff! Looks like a viable way to go about it at first glance. I'll get back to you with questions (as I'm sure I will have tons) when/if I dig into this more.

Appreciate all the help and your work!

— Reply to this email directly or view it on GitHub https://github.com/sasa1977/con_cache/pull/12#issuecomment-141830755.