wpsharks / comet-cache

An advanced WordPress® caching plugin inspired by simplicity.
https://cometcache.com
GNU General Public License v3.0
75 stars 17 forks source link

Fatal Error: Unable to obtain an exclusive lock #373

Closed Kalfer closed 9 years ago

Kalfer commented 10 years ago

The screen goes blank, both WP panel and on the site. I had to delete the plugin from ftp.

raamdev commented 10 years ago

@Kalfer Thanks for the report! Have you been able to test it on a clean WordPress installation, without any other plugins active? Or if you have a separate clean WordPress installation, you could try installing it there. I haven't been able to reproduce the problem you're describing, so I'm wondering if you have a plugin installed that's somehow conflicting with Quick Cache.

Kalfer commented 10 years ago

@Raam Dev , I have done a test on a clean installation of Wordpress without plugin activated, and the theme "Twenty Fourteen".

El 28/11/2014, a las 20:26, Raam Dev notifications@github.com escribió:

@Kalfer https://github.com/Kalfer Thanks for the report! Have you been able to test it on a clean WordPress installation, without any other plugins active? Or if you have a separate clean WordPress installation, you could try installing it there. I haven't been able to reproduce the problem you're describing, so I'm wondering if you have a plugin installed that's somehow conflicting with Quick Cache.

— Reply to this email directly or view it on GitHub https://github.com/websharks/quick-cache/issues/373#issuecomment-64923116.

raamdev commented 10 years ago

@Kalfer That's strange. I still haven't been able to reproduce this issue. Any chance I could get a temporary admin account on your test installation? You can use this link to submit the details privately: http://wsharks.com/1zG2BiF If it would be possible to get FTP access as well, that would definitely help troubleshooting this.

Kalfer commented 10 years ago

@raamdev I open support ticket (on WebSharks-inc) with the same subject: "Pro-V141127.rc”

El 30/11/2014, a las 4:15, Raam Dev notifications@github.com escribió:

@Kalfer https://github.com/Kalfer That's strange. I still haven't been able to reproduce this issue. Any chance I could get a temporary admin account on your test installation? You can use this link to submit the details privately: http://wsharks.com/1zG2BiF http://wsharks.com/1zG2BiF If it would be possible to get FTP access as well, that would definitely help troubleshooting this.

— Reply to this email directly or view it on GitHub https://github.com/websharks/quick-cache/issues/373#issuecomment-64974041.

raamdev commented 9 years ago

@Kalfer I've identified the issue. When I attempt to activate Quick Cache Pro v141127 (Release Candidate) on your site with WP_DEBUG set to TRUE, I get the following fatal exception:

[02-Dec-2014 00:26:54 UTC] PHP Warning:  fopen(/tmp/quick-cache.lock): failed to open stream: Permission denied in /home/cubierta/public_html/wp-content/plugins/quick-cache-pro/includes/share.php on line 1787
[02-Dec-2014 00:26:54 UTC] PHP Fatal error:  Uncaught exception 'Exception' with message 'Unable to obtain an exclusive lock.' in /home/cubierta/public_html/wp-content/plugins/quick-cache-pro/includes/share.php:1788
Stack trace:
#0 /home/cubierta/public_html/wp-content/plugins/quick-cache-pro/quick-cache-pro.inc.php(2853): quick_cache\share->cache_lock()
#1 /home/cubierta/public_html/wp-content/plugins/quick-cache-pro/quick-cache-pro.inc.php(471): quick_cache\plugin->add_advanced_cache()
#2 [internal function]: quick_cache\plugin->activate(false)
#3 /home/cubierta/public_html/wp-includes/plugin.php(505): call_user_func_array(Array, Array)
#4 /home/cubierta/public_html/wp-admin/includes/plugin.php(575): do_action('activate_quick-...', false)
#5 /home/cubierta/public_html/wp-admin/plugins.php(40): activate_plugin('quick-cache-pro...', 'http://cubierta...', false)
#6 {main}
  thrown in /home/cubierta/public_html/wp-content/plugins/quick-cache-pro/includes/share.php on line 1788

@jaswsinc This looks to be related to your work refactoring for cache locking (see https://github.com/websharks/quick-cache/commit/eafa08a4d90bfc65af785f596bdd2ed10a897e05). Is there something we can do to create a better fallback mechanism? I wouldn't want to push out something that might break on a large number of sites, resulting in "white screens of death" that are difficult for novice site owners to recover from.

@jaswsinc See https://websharks.zendesk.com/agent/tickets/4449 for test site details for testing this bug.

jaswrks commented 9 years ago

Quick Cache will first attempt to use a System V Semaphore to obtain an exclusive lock (compile PHP with --enable-sysvsem). Not all servers will have PHP compiled with support for this, but there is a large percentage that will, and it's generally more reliable in my experience. The functions that QC looks for are sem_get() and sem_acquire().

If a Semaphore is not possible, we use flock(), which is also a very good solution and works well. This maintains a mutex in the filesystem that is used to lock the cache directory. It requires a writable tmp directory though. That's not unlike WordPress itself; where WP needs a writable tmp directory also.

Finding a writable temporary directory on the server, follows this order of precedence, which can be found in the QC codebase — in the quick_cache\share class.

/**
 * Acquires system tmp directory path.
 *
 * @since 14xxxx Refactoring cache clear/purge routines.
 *
 * @return string System tmp directory path; else an empty string.
 */
public function get_tmp_dir()
{
    if(isset(static::$static[__FUNCTION__]))
        return static::$static[__FUNCTION__];

    static::$static[__FUNCTION__] = ''; // Initialize.
    $tmp_dir                      = &static::$static[__FUNCTION__];

    if(defined('WP_TEMP_DIR'))
        $possible_tmp_dirs[] = WP_TEMP_DIR;

    if($this->function_is_possible('sys_get_temp_dir'))
        $possible_tmp_dirs[] = sys_get_temp_dir();

    if($this->function_is_possible('ini_get'))
        $possible_tmp_dirs[] = ini_get('upload_tmp_dir');

    if(!empty($_SERVER['TEMP']))
        $possible_tmp_dirs[] = $_SERVER['TEMP'];

    if(!empty($_SERVER['TMPDIR']))
        $possible_tmp_dirs[] = $_SERVER['TMPDIR'];

    if(!empty($_SERVER['TMP']))
        $possible_tmp_dirs[] = $_SERVER['TMP'];

    if(stripos(PHP_OS, 'win') === 0)
        $possible_tmp_dirs[] = 'C:/Temp';

    if(stripos(PHP_OS, 'win') !== 0)
        $possible_tmp_dirs[] = '/tmp';

    if(defined('WP_CONTENT_DIR'))
        $possible_tmp_dirs[] = WP_CONTENT_DIR;

    if(!empty($possible_tmp_dirs)) foreach($possible_tmp_dirs as $_tmp_dir)
        if(($_tmp_dir = trim((string)$_tmp_dir)) && is_dir($_tmp_dir) && is_writable($_tmp_dir))
            return ($tmp_dir = $this->n_dir_seps($_tmp_dir));
    unset($_tmp_dir); // Housekeeping.

    return ($tmp_dir = ''); // Failed to locate.
}

In this report, I find that sem_get() is missing, so flock() is being used. The writable temporary directory defaulted to /tmp on this server, for lack of anything else configured for PHP or for WordPress itself.

Not a problem up until this point. I found that /tmp/quick-cache.lock exists, and that it's writable. However, when I ran the following test to determine more about the ownership of this file...

<?php var_dump(posix_getpwuid(fileowner(get_temp_dir().'/quick-cache.lock'))); ?>

I receive details that are NOT a match to the current user running PHP from this web space. That makes me think that this site owner has had more than one Quick Cache installation running on this server, and under more than one user account.

Since we have had to resort to using the system's /tmp directory on this installation, there is the potential for a conflict between multiple system users running multiple instances of Quick Cache; i.e. the /tmp/quick-cache.lock file is the same for all, but perhaps created by different users.

The solution I recommend is to add a unique hash on the end of the lock file (one that is WordPress and/or domain-specific); and the same for any Semaphores that we create too. This way multiple instances of QC (perhaps across multiple system users), will work without issue.


For reference. Taken from the quick_cache\share class.

/**
             * Get an exclusive lock on the cache directory.
             *
             * @since 140422 First documented version.
             *
             * @return array Lock type & resource handle needed to unlock later.
             *
             * @throws \exception If {@link \sem_get()} not available and there's
             *    no writable tmp directory for {@link \flock()} either.
             *
             * @throws \exception If unable to obtain an exclusive lock by any available means.
             *
             * @note This call is blocking; i.e. it will not return a lock until a lock becomes possible.
             *    In short, this will block the caller until such time as write access becomes possible.
             */
            public function cache_lock()
            {
                if($this->function_is_possible('sem_get'))
                    if(($resource = sem_get(1976, 1)) && sem_acquire($resource))
                        return array('type' => 'sem', 'resource' => $resource);

                // Use `flock()` as a decent fallback when `sem_get()` is not possible.

                if(!($tmp_dir = $this->get_tmp_dir()))
                    throw new \exception(__('No writable tmp directory.', $this->text_domain));

                $mutex = $tmp_dir.'/'.$this->slug.'.lock';
                if(!($resource = fopen($mutex, 'w')) || !flock($resource, LOCK_EX))
                    throw new \exception(__('Unable to obtain an exclusive lock.', $this->text_domain));

                return array('type' => 'flock', 'resource' => $resource);
            }
raamdev commented 9 years ago

@jaswsinc Thanks so much for looking into this and for your detailed reporting. :)

Using $_SERVER['HTTP_HOST'] should be fine here, correct?

$mutex = $tmp_dir.'/'.$this->slug.'-'.$_SERVER['HTTP_HOST'].'.lock';

and the same for any Semaphores that we create too.

Can you give me an example of how that would be done? Semaphores are new to me. I did some reading up about them, but I'm still not entirely clear how they differ from flock() or, for that matter, why they would be preferred over flock().

jaswrks commented 9 years ago

There's a bit of history behind this in QC. The original release of QC a few years ago, used exclusive cache locking like we just implemented again recently. Back then, we even gave site owners a choice of choosing between sem and flock from a UI option panel.

In the rewrite that we did last year, I tried to avoid the use of either of these, in favor of atomic file operations, because it was a confusing issue for many site owners to deal with, and of course it made the codebase more complex too.

Up until now, this worked well, but now we are finding that there are still a few cases where an exclusive lock is still a necessary thing to do, even when we use temp files and rename. So now, file locking has found its way back into Quick Cache once again.

sem vs. flock. There could be a debate about which one of these to use, and/or which one is better. It's not all black and white unfortunately. The differences are minor. Here's a quick summary.

A semaphore (i.e. a mutex) is more to-the-point in our use case here, and slightly faster than flock() is. The flock() alternative is not actually designed for the sort of use case that we have, but using a separate mutex file together with flock() we can still achieve the desired result in a reliable way.

Let me back up for just a moment though.

Why do we need an exclusive lock?

Multiple PHP processes operating on the same set of files, we need to be sure that writes in one process do not conflict with writes in another process. It's a common issue actually, and the concept of an exclusive lock is something that many software applications deal with.

Using a temp file followed by a rename when doing writes, has shown to be most effective at preventing race conditions, and is the fastest, easiest, and most efficient way to deal with such a scenario — in my experience anyway.

However, recently we have enhanced the features in Quick Cache that deal with cache clearing/purging, and since there are now so many of these operations that can occur while traffic is still hitting the front-end of a site; i.e. while we are purging/clearing a temporary copy of the cache directory; there is once again a need for an exclusive lock in a few cases. This is to avoid a race condition on very busy sites.

For example, if we are purging files in a temporary copy of the cache directory from the back-end, we need to lock the cache directory on the front-end while this occurs. It's not enough in that particular scenario to use the temp file, then rename methodology on its own. It needs to be coupled with an exclusive lock on the entire cache directory.


Ups, downs, and gray areas...

sem. The ups would include speed, performance, and that it's functionality is better suited to our use case; i.e. it is a system that is specifically designed for the sort of locking that we need. It is also (generally speaking) easier to work with across different types of filesystems. e.g. Hosting companies may choose to alter their implementation slightly to make Semaphores more reliable with a specific cloud-based architecture they use.

flock is always going to be a local mutex file, no matter the server architecture. In some cases this could make it a more reliable choice actually, but overall it is less flexible when it comes to the control that a hosting company may have over how locks takes place. The use of flock() is really intended for specific files, not to lock an entire directory or a multiprocess application itself. It is also slightly slower, but not a huge difference in my tests. Overall, it's a very good alternative to Semaphores, and since it is a simpler/cleaner way to lock a mutex file, there are cases when it may be more desirable, even though it is technically less flexible.


Other Thoughts...

We should probably consider making the lock functionality configurable, or provide a filter to control this. Some site owners may find it necessary to tweak this for one reason or another. i.e. alter the choice between sem and flock, as one technique might work better on some hosting platforms than it does on another.

It might even be a good idea to have a PHP Constant that could be defined to control this too, so that hosting companies could define that constant and force a particular value for all of their clients that use Quick Cache; as this is going to be something that may need to change based on the underlying architecture they use.


Unique Semaphore Key

@raamdev writes...

Can you give me an example of how that would be done?

The current implementation uses a hard-coded value of 1976 as the IPC key for the Semaphore. The ftok function can be used to accomplish what we need though, when it comes to the Semaphore. Something like this might work, though I have not tested this myself. I have only used hard-coded values in my prior testing with this PHP feature.

$ipc_key = ftok(ABSPATH.'wp-config.php', 'w');
// Using this instead of `1976` in the call to `sem_get`.
public function cache_lock()
            {
                if($this->function_is_possible('sem_get'))
                   if(($ipc_key = ftok(ABSPATH.'wp-config.php', 'w')))
                        if(($resource = sem_get($ipc_key, 1)) && sem_acquire($resource))
                            return array('type' => 'sem', 'resource' => $resource);

...

FLOCK Unique Key

It would be my opinion that the same technique would work best for the .lock file also, as opposed to using $_SERVER['HTTP_HOST'] which is not quite as reliable in my view.

$ipc_key = ftok(ABSPATH.'wp-config.php', 'w');
$mutex = $tmp_dir.'/'.$this->slug.'-'.$ipc_key.'.lock';
jaswrks commented 9 years ago

Another article that tries to go over this in some detail. https://locallost.net/?p=1091

jaswrks commented 9 years ago

@raamdev Regarding the FLOCK Unique Key...

On second thought, using ftok() for the .lock file method may not work, since the ftok() function (which is for the System V extension) may not even be available on servers that are missing the System V extension for PHP. Using fileinode() there might make more sense when it comes to the .lock file.

$inode_key = fileinode(ABSPATH.'wp-config.php');
$mutex = $tmp_dir.'/'.$this->slug.'-'.$inode_key.'.lock';
raamdev commented 9 years ago

I tested the new code using the unique IPC key during cache locking on @Kalfer's test site and can confirm that the problem no longer presents itself (i.e., Quick Cache Pro activates successfully without any error).

Working now on preparing this fix for release.

raamdev commented 9 years ago

Closing this issue. The bug reported by @Kalfer has been fixed and will go out as part of the next release.

@jaswsinc Thank you for your help with this!

@Kalfer Thanks so much for reporting the bug!