matthiasmullie / scrapbook

PHP cache library, with adapters for e.g. Memcached, Redis, Couchbase, APC(u), SQL and additional capabilities (e.g. transactions, stampede protection) built on top.
https://www.scrapbook.cash
MIT License
315 stars 27 forks source link

Stampede not working for phpredis > 4.0.0 #48

Closed ValiNiculae closed 1 year ago

ValiNiculae commented 3 years ago

Hi!

There seems to be an issue with the StampedeProtector since phpredis 4.0.0. The exists method now returns an int instead of TRUE/FALSE: https://github.com/phpredis/phpredis#exists

image

Another issue I found is with the TTL of the stampede protect. We set it in milliseconds, in the constructor, but when we do: $success[$key] = $this->cache->add($this->stampedeKey($key), '', $this->sla); (in StampedeProtector@protect) is set in seconds.

I will open a pull request to fix/discuss these issues.

Thanks!

matthiasmullie commented 1 year ago

Thanks for spotting this, and apologies for not having seen this & responding earlier.

There was, however, a fix submitted long ago WRT the change in exists return value: 31f45780a2f1a28cb3f928c631f311dcb0912c96 (in a way that remains compatible with older Redis versions)

And another nice catch with the StampedeProtector TTL! It's a bit of an annoying one to land, because there's no way to "fix it right".

Changing the constructor from milliseconds to seconds may have a big effect for those who already pass in a milliseconds value. For the most part, this is probably not a huge deal. The locks will be there for a (too) long time, but once another process has completed & stored the value, the long lock becomes irrelevant. I.e., most currently don't ever notice. If we were to change that to seconds, their current milliseconds-input may have a significant effect: they would now find their code sleep for a really long time before polling again (e.g. 100 seconds rather than 0.1), and may end up with too many concurrent processes. Basically: we can't move towards seconds without expecting users to update their calling code, so this would effectively be a breaking change.

OTOH, sticking with milliseconds is simply wrong.

I'll merge the fix now that I'm about to roll out a new major release.

matthiasmullie commented 1 year ago

After thinking things over some more last night, I think we should stick with milliseconds. 2 things are affected by that time:

Seconds have no benefit over milliseconds, but there is one the other way around: it becomes possible to acquire a shorter lock/protective time. The "ideal" lock time is essentially determined by 2 variable factors:

When a stampede happens, it is very likely that there's a high number of incoming requests. And the average response time for a relatively responsive application is usually not over a couple hundred milliseconds. Ergo, being able to set a sub-second TTL may be important to achieve a good balance between both factors in certain applications. Usually, a 1-or-more-seconds TTL will be just fine, but that will remain possible with milliseconds support as well.

But the catch: we can't acquire a sub-second lock. The only thing we can do is basically rounding up to the nearest second. And... that's just fine!

Imagine a sub-second lock TTL (e.g. 500ms).

The only case where "the lock sticking around for longer than it was supposed to" becomes a problem, is when request 1 (the one that created the lock & was supposed to store the new data) didn't complete its job (e.g. crashed). If that happens, all new requests coming in are still assuming that some other process is working on it (because there is a lock file), when it fact that's not the case. It would be better for the lock file to be removed, so that another process can pick up the work. That doesn't change with moving from milliseconds to seconds, though; then, too, those other processes would be stuck waiting out the remainder of the second.

(Of course, it's worse in the current incorrect implementation - the lock is held for much, much longer; request 1 not being able to fulfill its job has longer-lasting impact, and that needs to be fixed)

I'm going to stick with a milliseconds SLA, but will fix the lock time so that it's holding it for the "correct" (rounding up the milleseconds to a second) time.

Does that make sense?