karlseguin / ccache

A golang LRU Cache for high concurrency
MIT License
1.27k stars 118 forks source link

would like a version of Fetch that doesn't require a closure #58

Closed eli-darkly closed 3 years ago

eli-darkly commented 3 years ago

This is sort of nitpicky performance-tuning stuff and I could be wrong on some of the details, but here's what I'm talking about:

Fetch currently takes a func() (interface{}, error). In other words, it doesn't pass the key (or any other parameters) to the function that computes new values. That means that the function can't be defined globally or at any other scope other than the scope that knows what the key is. It has to be a closure, or a method of some object that knows what the key is.

Due to how the Go runtime works, that has some undesirable implications if you're trying to optimize for minimal heap allocations:

This is a kind of thing that wouldn't come up in a language like Java where you don't really have any choice because almost everything is an object on the heap. But in high-throughput services implemented in Go, since you can control that stuff to some degree as long as you're careful, it can be a bit annoying to have one's control limited by API choices like this... when as far as I can tell there's no great reason for the API to work that way. (That is, everywhere that you are calling the fetch function, you already know what the key is, so it would be perfectly easy to pass the key to the function. And that's how cache-loading functions work in pretty much every other caching API I've seen that has such a concept.)

eli-darkly commented 3 years ago

Here's an example of what I mean. Let's say I have a cache and some kind of database, both stored in global variables. I want to load and cache the result of a database query whenever there's a cache miss. Using Fetch as it is now, I would do something like this:

func getCachedValueOrLoadValue(key string) (interface{}, error) {
   return myCache.Fetch(key, cacheTTL, func() (interface{}, error) {
      return myDatabase.Query(buildQueryForKey(key))
   }
}

That looks straightforward, but what happens internally is more like this (not exactly, but close enough):

type queryClosureParams struct {
    key string
}

func newQueryClosureParams(key string) *queryClosureParams {
   return &queryClosureParams{key}
}

func queryClosure(params *queryClosureParams) {
   return myDatabase.Query(buildQueryForKey(params.key))
}

func getCachedValueOrLoadValue(key string) (interface{}, error) {
   params := newQueryClosureParams(key)
   return myCache.Fetch(key, cacheTTL, func() (interface{}, error) {
      return queryClosure(params)
   }
}

What would avoid this is if there were an alternate Fetch (better name TBD, let's call it Fetch2 for now) that worked like so:

func loadValue(key string) (interface{}, error) {
   return myDatabase.Query(buildQueryForKey(key))
}

func getCachedValueOrLoadValue(key string) (interface{}, error) {
   return myCache.Fetch2(key, cacheTTL, loadValue)
}

There it doesn't have to create a closure— the function reference is just a simple function value, with no other state.

(This can be a bit tricky to get right because, if loadValue were a method of some component (like, func (c *MyComponent) loadValue(key string) (interface{}, error), it would still behave like a closure because the value of c is an implicit parameter. The workaround for that is that the MyComponent type would declare a field like loadValueFn func(string) (interface{}, error) and then its constructor would do c.loadValueFn := c.loadValue, which would create a closure that represents the instance method as a simple function— but it would only create it once, not once per query. Again, this is the kind of stuff that won't matter to many people but, since Go is a popular solution for high-traffic services, it makes sense to allow people to leverage Go's flexibility for best performance if they want to.)

karlseguin commented 3 years ago

Sorry to repeat my last answer, but I think having your own app-specific fetch makes sense. I think they're both good features, but the permutations of features people might want with fetch makes it hard to manage.

I think adding a section in the readme about 'custom fetches' might be useful.

Thoughts?

eli-darkly commented 3 years ago

Well, I wasn't thinking about this just as an extra feature specific to Fetch, but also a basic design principle that I'd prefer for things to follow in the future even if the API of the existing things can't be changed now. In general, using a stateless function is preferable over using a stateful function (in regard to the state of an individual operation, that is; there might of course be some implicit state in the form of a database connection or whatever)— the latter will usually have some kind of disadvantage on any platform in either efficiency or other areas, I just mentioned the ones it happens to have in Go.

If you want to add some further docs about alternatives to Fetch, that's fine, although I would suggest putting that content in the doc comment for Fetch (at least, to clarify that it's not doing anything you can't already do via other public APIs) rather than in the readme.

eli-darkly commented 3 years ago

Heh— I just realized that in my particular use case, where I'm also using singleflight, there's no way to completely get rid of this issue because singleflight.Group.Do also takes a nullary function. Ah well.