open-lms-open-source / moodle-local_redislock

6 stars 17 forks source link

If there is already a lock but the PID no longer exists, expire it #4

Closed mwithheld closed 5 years ago

mwithheld commented 5 years ago

Untested code, possibly controversial. If the webserver is restarted without clearing up Redis locks, try to clear out locks from non-existent PIDs rather than waiting for the $maxlifetime.

mwithheld commented 5 years ago

The CI failures are due to "Plugin "local_redislock" (2017120800) could not be installed. It requires a newer version of Moodle".

mwithheld commented 5 years ago

Hi @polothy - would you be able/willing to look at the idea suggested here?

polothy commented 5 years ago

I reviewed this a while back, IIRC, the problem was if you had more than 1 web node, it wouldn't work anymore.

mwithheld commented 5 years ago

We have 4 web nodes so that is important. That if clause checks the hostname matches first. I.e. the PID check is hostname-specific so that it won't attempt to delete PIDs from another webserver hostname.

polothy commented 5 years ago

Right, so it only works if the user just happens to hit the same web node and also has a stale lock on a dead process. Unsure if there is a clear win here.

mwithheld commented 5 years ago

Fair point :). The symptom I've seen over our last three updates is that the forum jobs are not running in the day after these OS updates b/c of process locks in the redis store. We have automatic OS updates which restart the webservers, leaving stale PIDs in redis (which is on another server). Unfortunately, restarting the redis instance clears user sessions, the redis servers are patched on different days, and we have no way ATM to automatically clear redis when the webserver OS is patched. The same happens when someone/something kills a PHP process -- the process locks depend on the Moodle code deleting the lock (I can't remember if there's also a long timeout on e.g. the forum processing). If there's no mechanism to clear stale PIDs in the cache store, they stay in there until you either restart/clear redis or manually go in and delete the key-value pairs, which is my current work-around.

There's another reason: The redis_lockfactory get_lock() method fails to make use of the third parameter, "$maxlifetime The number of seconds to wait before reclaiming a stale lock.". This would probably be better TBO ( $redis->setTimeout https://github.com/phpredis/phpredis#expire-settimeout-pexpire), but the PID way would work with the current implementation to reclaim stale locks.

On Tue, Sep 11, 2018 at 8:52 AM Mark Nielsen notifications@github.com wrote:

Right, so it only works if the user just happens to hit the same web node and also has a stale lock on a dead process. Unsure if there is a clear win here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/blackboard-open-source/moodle-local_redislock/pull/4#issuecomment-420323300, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwpOiZyqsO2RqCFqoMpF9L0vHNqrJUMks5uZ9w_gaJpZM4VrR3F .

polothy commented 5 years ago

Going to close as I don't think we will take this. Thanks for the suggestion though!