Chain concurrent gets - Githubissues

kpdecker commented 10 years ago

Ideally the framework should monitor what get requests are still in flight and chain all such calls to reduce upstream load. I.e.

get('foo', callback1)
get('foo', callback2)
(callback1 called)
(callback2 called)

Should only call the relevant engine method once.

hueniverse commented 10 years ago

I assume you mean for only getOrGenerate?

kpdecker commented 10 years ago

That is our primary use case but the get primitive seems like it could benefit from this behavior as well.

chapel commented 10 years ago

We have thought about this as well, but it would require having an internal memory cache in catbox itself and would breakdown between various modules.

As an aside, I could see something like CLS being used for this, where after the first call, any subsequent calls in the same call stack/chain could use the in memory cached version. With that said I actually think this would be better served as a module that wrap catbox to provide this functionality, and to let catbox itself focus on what it needs to do.

kpdecker commented 10 years ago

How does this require an in-memory cache? You just need a list of pending requests, basically a hash of cache keys and an array of callbacks to call once the data is available. This data need not be kept around any longer than the duration of the request. If another get comes in while there are no pending operations then the cycle is restarted.

My end goal is that this behavior occurs in hapi. I don't quite care where the logic is implemented, although catbox would be easier for for the cases where I use catbox directly.

chapel commented 10 years ago

So even in that case you have the problem of different instances of catbox. They wouldn't be able to share the result since they wouldn't know about each other. On Sep 5, 2014 8:23 AM, "Kevin Decker" notifications@github.com wrote:

How does this require an in-memory cache? You just need a list of pending requests, basically a hash of cache keys and an array of callbacks to call once the data is available. This data need not be kept around any longer than the duration of the request. If another get comes in while there are no pending operations then the cycle is restarted.

My end goal is that this behavior occurs in hapi. I don't quite care where the logic is implemented, although catbox would be easier for for the cases where I use catbox directly.

— Reply to this email directly or view it on GitHub https://github.com/hapijs/catbox/issues/73#issuecomment-54639485.

kpdecker commented 10 years ago

I don't think that different instances of catbox can assume that different cache values are the same for a given key. Even when they are, this is a performance optimization whose goal is to reduce overhead where possible, not to reduce all unnecessary calls to whatever the engine speaks to. 1 cache request per catbox instance at any given time is much nicer on the upstream system than 1 cache request per server request.

hueniverse commented 10 years ago

I don't want to touch get(). The whole point of the cache engine is to do this really fast. This only makes sense for getOrGenerate() since that's where you have a costly generate method that you want to avoid calling twice.

hueniverse commented 10 years ago

I've decided this is actually a bug. Best case it is just causing duplicated effort. Worst case it is generating conflicting values at the same time with a race condition.

lock[bot] commented 4 years ago

This thread has been automatically locked due to inactivity. Please open a new issue for related bugs or questions following the new issue template instructions.

hapijs / catbox

Chain concurrent gets #73