Some question about #126 "Cache invalidation for scripts in symlinked folders"

devtoby commented 9 years ago

I have some question about https://github.com/zendtech/ZendOptimizerPlus/issues/126

/somefolder
    /production -> 20130823_2207
    /20130823_2207 # new version
    /20130823_2115 # old version
    ...

@rlerdorf said the opcode cache uses the realpath of the files. After project symlink is changed, a request came, who tell php use the old script cache? Is the opcode cached the old symlink production -> 20130823_2115?

Sorry, I'm not good at English.

frickenate commented 9 years ago

I'm in the process of setting up a new deployment system for a project, and am furiously googling everything related to this stuff. The best source I've found is this detailed article about opcache:

Because of PHP's realpath cache, you may experience problems if using symlinks to handle your documentroot for deployment. Turn opcache.use_cwd and opcache.revalidate_path to 1, but even with those settings, bad symlink resolutions may happen, this is because PHP answers OPCache realpath resolution requests with a wrong answer, comming from its realpath_cache mechanism.

Supplemented by this realpath article:

The realpath cache is process bound, and not shared into shared memory

This means that anytime a cache entry expires, changes, or you empty the cache manually, you have to do this for every process in your pool. This is usually why people fail at deploying code using OPCode caches solutions. What people usually do when deploying, is changing a symlink from say /www/deploy-a to /www/deploy-b. What they usually forget is that opcode cache solutions (at least OPCache and APC) rely on the internal realpath cache from PHP. So those opcode cache solutions won't notice the link change, and worse, they're gonna start noticing it little by little, as the realpath cache of every entry slowly expires. You know the result.

Unfortunately the final answer is we are all screwed. It doesn't matter how you configure the opcache settings. It doesn't matter that you call opcache_reset() after switching the symlink. It doesn't matter that you try to call clearstatcache(true). None of it is enough. The addition of the realpath cache to PHP screwed us all. Few possible workaround attempts aside from severely over-complicating deployments. Instead of switching a symlink and clearing the opcache... set up multiple worker pools with load balancing and blah blah blah. Basically, opcache is not a viable investment for 90% of projects that cannot invest a not-insignificant amount of time configuring multiple concurrent production versions of the codebase.

The only workaround without doing a lot of work is to restart php-fpm (with nginx), or apache (with mod_php) after swapping the symlink. Of course for a very busy production site, this could essentially take your site down with the amount of traffic slam trying to come in against a rebooting web server. shrug

rlerdorf commented 9 years ago

@frickenate of course there is. You don't want to flip a symlink mid-request regardless since dependencies between files will break things. If you are using nginx simply do:

fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $realpath_root;

That will resolve the symlink and set the doc_root to the target of the symlink for the duration of the request. That ensures that the docroot doesn't change mid-request even if the symlink is flipped. This makes opcache happy and as a bonus you have now achieved correct atomic deploys assuming all your includes are relative to the docroot and both without ever restarting anything nor clearing any caches. People say this approach is complicated, but it really isn't since there is literally nothing to do. You deploy to a directory, then point the symlink at it and you are done. Perhaps I used too many words to describe it in my writeup at https://codeascraft.com/2013/07/01/atomic-deploys-at-etsy/ since I also described flipping back and forth to preserve previously cached scripts and talked about how to automatically change include_path, neither of which may be needed in your particular case.

frickenate commented 9 years ago

I actually read that etsy post too. :) Unfortunately I have some framework requirements that stick me to apache for the time being, and I would have to go through approval with security team to get clearance for an unknown module to be compiled in. I don't understand the nuances of what is being said in the pull request about (in)compatibility with mod_rewrite, which scares me as we make heavy use of this as well.

I suspect I will first try the apache graceful approach to eliminate the most severe possible problems, and see how heavily our production load slams the web nodes when we do this. A more extensive solution will likely have to wait until some time down the road.

rlerdorf commented 9 years ago

There is no issue with mod_rewrite. Etsy uses a billion mod_rewrite rules. And technically you don't need to use mod_realdoc. You could do it in userspace PHP if you have a front controller that requests all go through. You can do the docroot rewrite there. Not as slick and it becomes php-specific. The slick part of having your web server do it is that it is a language-agnostic way to achieve atomic deploys in a deploy system that uses a symlinked docroot.

You can also handle it by turning off all caching. You can turn off the realpath cache (just set realpath_cache_size to 0) and make opcache stat on every request. This is what it sounds like you are asking for, but it obviously hurts performance to turn off the realpath cache and your deploy system still won't be atomic since you will have no protection from the symlink being swapped mid-request.

jportoles commented 9 years ago

I hope you will excuse bumping this issue but after reading @rlerdorf essay and the information raised in #126, I'm still left with a couple questions about the opcode/realpath cache and the documentation on this seems very scarce besides these two resources.

So if I understand correctly, there are two main issues that are typically experienced with symlink deployments:

The symlinks may update in the middle of on-going requests, potentially causing said requests to include/require files of a different version leaving the request in an "undefined" state (at best).
Once a symlink update occurs, unless the realpath cache is disabled, it will typically take a while until the entry in the cache is refreshed. In the meantime, if the opcode cache is enabled, since the realpath cache won't be immediately up-to-date, the older version will be cached and will thus be served indefinitely even if the realpath cache is cleared at a later point.

In order to address 1., you can either let the web server resolve the symlink, assign it to DOCUMENT_ROOT, and base all your includes/requires or autoload routines around DOCUMENT_ROOT, or you can handle it manually by defining the base root via __FILE__ at the beginning of the request and use it in all includes/requires/autoload routines as per @TerryE comment here.

However in order to address 2., there is no choice but to either disable the realpath cache, disable the opcode cache, or have the web server resolve the symlinks before the requests hit the php interpreter, effectively bypassing the realpath cache altogether. Is this correct? And if yes, is there a way to work around it that doesn't involve getting rid of the realpath cache while keeping the opcode cache, particularly for nginx? Because hitting the disk with every request doesn't sound like the best of the ideas. Or should we perhaps just assume the hit is minimal since caching at the OS level can still take care of it?

Just trying to understand so we don't shot ourselves in the foot while trying to make our deployment strategy work.

Edit: Assumptions updated with realpath cache <-> opcode cache relationships.

rlerdorf commented 9 years ago

@jportoles No, by addressing 1 you automatically resolve 2 as well because you aren't actually using symlinks at the PHP level anymore. That is the whole point of that solution. Let me try again... You have a symlink /var/www/htdocs which points to /var/www/A which has your currently running files. You have configured your web server to resolve that symlink, or you do it in PHP in your front controller so at the PHP level everything below your front controller just deals with /var/www/A. Nothing in PHP knows about /var/www/htdocs including the realpath cache. When you now deploy the new version of your site you put the new files in /var/www/B and flip the symlink. Now the web server or your front controller will tell your scripts that the DOCUMENT_ROOT is /var/www/B and everything will use that. The realpath cache is not affected at all because /var/www/A and /var/www/B are the real paths. There is no way an entry for /var/www/A in the realpath cache will suddenly point to a file in /var/www/B. The only time the realpath cache comes into play is if you don't resolve issue #1 and flip your document_root mid-request. Then you hit both problems because the path /var/www/htdocs/file.php may either point to /var/www/A/file.php or /var/www/B/file.php depending on whether the realpath cache has updated yet or not.

Perhaps the misunderstanding is what the realpath cache actually does?

It caches the output of the UNIX realpath() call. This call takes a path and figures out the real path by resolving any symlinks in it. So /var/www/htdocs/file.php is resolved to /var/www/A/file.php in this example.

By explicitly setting the DOCUMENT_ROOT to /var/www/A at the start of the request, you are using absolute paths and /var/www/A/file.php will always be /var/www/A/file.php.

jportoles commented 9 years ago

Ok, that I fully understand, but that only takes care of includes/requires at a user space level, what happens with the logic between the server software and the interpreter when a request arrives? That is to say, if I have a nginx host pointing to a symlink, /var/www/production pointing to /var/www/A, and then I update the symlink to /var/www/B, is it safe to assume that when a request hits the host, php will initially load the correct version and not be confused due to a stale realpath cache?

rlerdorf commented 9 years ago

No, this isn't limited to includes/requires at the PHP level. In fact it has nothing to do with PHP. If you configure your web server to resolve the symlink by using "fastcgi_param DOCUMENT_ROOT $realpath_root;" in your nginx config, for example, then everything will be using the absolute path the symlink resolves to. We would be having this exact same discussion if you were running something written in any other language.

Think of it as being equivalent to shutting down your web server. Editing the config file and changing the document root from /var/www/A to /var/www/B. As far as PHP is concerned, that is what you are doing with the convenient difference that you can achieve that without actually shutting down and losing any requests.

jportoles commented 9 years ago

Ok, if you instruct nginx to use $realpath_root then no question there that it should work no matter the actual backend, I understand that much. I was thinking more about what happens when you don't do this, but rather handle it at a PHP level by defining a "ROOT_DIR" constant at the beginning of the request, as suggested here. It sounds like this would work only partially, since the server software wouldn't have knowledge about it.

The reason I bring that up is because while using $realpath_root in nginx vhosts would do the trick, we are concerned about the performance impact this may have of it hitting realpath with every request. So I wanted to understand if an alternative would be viable. Since you decided to implement a cache in mod_realdoc, I suppose it is important. But perhaps we are overthinking it?

rlerdorf commented 9 years ago

The performance impact should be pretty minimal. It is a single realpath call. But yes, when I implemented this for Apache I did put in a ttl so it wouldn't do it on every request. See https://github.com/etsy/mod_realdoc/blob/master/mod_realdoc.c#L148 I suggested this to the nginx folks but they didn't seem very concerned about the performance implications of this and they are probably right. These days most people who are concerned about performance at this level will have their docroot on tmpfs which makes the issue go away entirely.

For the case Terry was talking about where you handle it at the PHP level in the front controller, you are right that other non-PHP requests to the same server wouldn't know about it. But it is pretty rare to have 2 kinds of dynamic requests hitting the same server. The only other kind of request is likely to be for static assets and they should be versioned and served out of a different directory entirely anyway.

frickenate commented 9 years ago

By the way, I discovered something when I implemented the mod_realdoc solution. There is a problem with using the PHP opcache with a web server realpath. Each time you switch the symlink to a new path, obviously PHP will add new cache entries to the opcache. If you choose to deploy to new directories based on versions (ex: timestamp), you will eventually run out of opcache's allocated memory. I expected for the PHP opcache to restart/flush/reset when its memory allocation is filled - why else would opcache_get_status() include a stat for "oom_restarts" (presumably, "out-of-memory restarts").

However, this is unfortunately not the case. When opcache becomes full, it does not restart. Instead, the cache is simply full and all new attempts to cache files are ignored. You are effectively running your entire codebase without opcache at all (and arguably even worse off, with the overhead of every page hit attempting to compile and cache every script, only to fail to insert to the cache).

I tested this multiple times to the same conclusion - when you run out of memory, opcache_get_status() clearly shows a) no reset of memory, b) no increase in oom_restarts, and most dangerously, c) "scripts" does not contain any entries for deployment versions after the cache became full.

Sadly I was forced to also make the curl call to localhost on each deployed server to make a call to opcache_reset(). It was either this or restart the web server.

tldr; If you use web server realpath, make sure you are resetting the opcache every time you change the symlink, or your opcache will eventually fill up and become effectively disabled.

jportoles commented 9 years ago

@rlerdorf actually for static assets and non-PHP requests it wouldn't be an issue since those directives are passed only to the fastcgi/fpm proxy which typically handles php files only. I was more concerned about the possibility that the "entry point" php file that nginx passes over could still be under an unresolved symlink that the interpreter could still load incorrectly from the realpath and opcode caches, even if you took measures at an user space level. But yeah, sounds like we are better off just adopting those nginx directives and optimize elsewhere if we need to afterwards.

@frickenate good tip, thanks for adding. If a single opcache_reset call via curl takes care of it I would say that's fine, in our case for instance we already have post deploy hooks via curl to handle clearing APCu and the like so it's not a big deal.

rlerdorf commented 9 years ago

@frickenate opcache is supposed to clear itself when it fills up. It certainly does for me. There is even a counter that counts how many times it has happened since you initially started it to give you an idea of whether you need to allocate some more memory for it. See https://github.com/zendtech/ZendOptimizerPlus/blob/8c3e56f83bf4fda5f6780618638c77ee43867a70/ZendAccelerator.c#L2099 You could try playing with opcache.max_wasted_percentage to see if that helps trigger it. The logic for triggering a restart is here: https://github.com/zendtech/ZendOptimizerPlus/blob/8c3e56f83bf4fda5f6780618638c77ee43867a70/ZendAccelerator.c#L187-L192

And yes, @jportoles if you want to do it at the PHP level, that entry point is not going to be protected against atomicity issues. That's why I said every PHP file below the entry point earlier. So if you are going to push a change to that file you have to make sure it is backward and forward compatible which usually means a slow transition over a couple of pushes. Like if you are changing the function signature of something it calls out to, you would need to make a new function with the new signature, push that. Then on the next push change the front controller to call this new function. And then a third push to remove the old version. Definitely annoying, which is also why such a front controller should be minimal and as self-contained as possible. If all it does is set the docroot and hand off to something else it should work perfectly fine.

frickenate commented 9 years ago

@rlerdorf Exactly, that logic for restarts only results in a restart if you have a wasted percentage in play. Filling the cache to full shows the following in stats:

current_wasted_percentage = 0 wasted_memory = 0 cache_full = true pending_restart = false restart_in_progress = false

Have you actually seen oom counter increase or are you just assuming it works? Because for me, filling the cache resulted in 0 wasted memory even after days of running. If wasted memory is the only way to trigger the restart, how are you supposed to generate wasted memory without manually deleting keys?

jportoles commented 9 years ago

@rlerdorf makes sense, we arrived at the same conclusion. We'll experiment and see what works best for us. Thanks for replying and for the information.

rlerdorf commented 9 years ago

@frickenate Sure, I see the restarts often. By deploying using the A/B strategy I describe and always switching back and forth between two directories vs. always deploying to a new empty one you end up being able to re-use much of your cache, but the files you do change end up triggering some deletes and this wasted memory grows which in turn triggers the reset on cache-full. Having said that, it should also reset on a straight cache-full. That part is a valid bug if that isn't happening.

peterjmit commented 8 years ago

@frickenate I see the exact same behaviour as you (/cc @rlerdorf)

I am using fastcgi_param DOCUMENT_ROOT $realpath_root; + capistrano style deployments for atomic deploys.

The opcode cache fills to the point where I see cache_full = true but there is no wasted memory and no oom_restarts.


{
    "opcache_enabled": true,
    "cache_full": true,
    "restart_pending": false,
    "restart_in_progress": false,
    "memory_usage": {
        "used_memory": 134214488,
        "free_memory": 3240,
        "wasted_memory": 0,
        "current_wasted_percentage": 0
    },
    "interned_strings_usage": {
        "buffer_size": 16777216,
        "used_memory": 6323032,
        "free_memory": 10454184,
        "number_of_strings": 91487
    },
    "opcache_statistics": {
        "num_cached_scripts": 4803,
        "num_cached_keys": 4831,
        "max_cached_keys": 16229,
        "hits": 16734221,
        "start_time": 1442514038,
        "last_restart_time": 1442932349,
        "oom_restarts": 0,
        "hash_restarts": 0,
        "manual_restarts": 1,
        "misses": 98877557,
        "blacklist_misses": 0,
        "blacklist_miss_ratio": 0,
        "opcache_hit_rate": 14.474494977493
    }
}

opcache config

opcache.consistency_checks => 0 => 0
opcache.dups_fix => Off => Off
opcache.enable => On => On
opcache.enable_cli => Off => Off
opcache.enable_file_override => Off => Off
opcache.error_log => no value => no value
opcache.fast_shutdown => 0 => 0
opcache.file_update_protection => 2 => 2
opcache.force_restart_timeout => 180 => 180
opcache.inherited_hack => On => On
opcache.interned_strings_buffer => 16 => 16
opcache.load_comments => 1 => 1
opcache.log_verbosity_level => 1 => 1
opcache.max_accelerated_files => 15000 => 15000
opcache.max_file_size => 0 => 0
opcache.max_wasted_percentage => 5 => 5
opcache.memory_consumption => 128 => 128
opcache.optimization_level => 0xFFFFFFFF => 0xFFFFFFFF
opcache.preferred_memory_model => no value => no value
opcache.protect_memory => 0 => 0
opcache.restrict_api => no value => no value
opcache.revalidate_freq => 0 => 0
opcache.revalidate_path => Off => Off
opcache.save_comments => 1 => 1
opcache.use_cwd => On => On
opcache.validate_timestamps => Off => Off

rlerdorf commented 8 years ago

Right, since there is no wasted memory, meaning no stale entries, there isn't really anything opcache can do for you here. You simply need to increase your cache size so that you can fit two copies of your entire site assuming you are flipping back and forth between two deploy directories.

Failing that you will need to add a manual cache clear to your deeply mechanism, but that is rather inefficient.

peterjmit commented 8 years ago

@rlerdorf no, it is timestamped releases (capistrano). I suppose switching between directories like you mention would work.

This is a pretty sharp edge on opcache that I feel a lot of users wont be aware of - then again the number of people doing atomic deploys in this fashion is not likely to be huge either.

I guess manually clearing opcache on deploy is the way we have to go

rlerdorf commented 8 years ago

It makes way more sense to toggle between and A and a B since you are likely to have quite a few unchanged files across deploys which will allow you to re-use the cache entries instead of having to recompile every file on every deploy. This also means you will end up with stale entries for each modified file and the cache will clean itself when it fills up. As it is now your deploy system is rather inefficient and it also doesn't give PHP any indication as to which entries are stale since you just keep adding new ones.

If you really want time stamped backups you can always leave those around and just rsync into the non-active A/B production directory.

rlerdorf commented 8 years ago

It makes way more sense to toggle between an A and a B since you are likely to have quite a few unchanged files across deploys which will allow you to re-use the cache entries instead of having to recompile every file on every deploy. This also means you will end up with stale entries for each modified file and the cache will clean itself when it fills up. As it is now your deploy system is rather inefficient and it also doesn't give PHP any indication as to which entries are stale since you just keep adding new ones.

If you really want time stamped backups you can always leave those around and just rsync into the non-active A/B production directory.

peterjmit commented 8 years ago

@rlerdorf I am hoping this is just a gap in my understanding (so thank you in advance if you clear it up :smile:)

What about the case where in three deploys, the first and third deploy change the same file and file stat/validate timestamps is disabled.

For example:

Release 1 modifies foo.php gets deployed to /var/www/a/foo.php
Release 2 modifies bar.php gets deployed to /var/www/b/bar.php
Release 3 modifies foo.php and is deployed to /var/www/a/foo.php

It is my understanding here that after release 3 is deployed (unless the opcache is cleared) the version of foo.php in opcache will be from release 1 and be stale/out of date.

This scenario is roughly similar to not using $realpath_root and having to clear opcache on each deploy (i.e. non-atomic deploys).

To me it seems like the only real option here would be to have a script that did something along the lines of the following on each deploy (and called via wget or curl).

use Symfony\Component\Finder\Finder;

$finder = new Finder();
$finder->files()
    ->in('/var/www/releases/12344355')
    ->name('*.php');

$result = [];
foreach ($finder as $file) {
    $path = $file->getRealpath();
    $result[$path] = opcache_invalidate($path, true);
}

rlerdorf commented 8 years ago

On Oct 5, 2015, at 20:59, Peter Mitchell notifications@github.com wrote:

What about the case where in three deploys, the first and third deploy change the same file and file stat/validate timestamps is disabled.

Don't disable stats completely. Configure it to stat every couple of seconds. The performance impact is minimal, certainly less than needing to wipe out your cache on deploys.

-Rasmus

marcovtwout commented 8 years ago

@frickenate You said "It doesn't matter that you try to call clearstatcache(true)". Why not?

devtoby commented 8 years ago

It looks not because Opcache, php realpath_cache cache the old path. Thanks very much.

marcovtwout commented 8 years ago

But clearstatcache(true) should clear that cache (first parameter = bool $clear_realpath_cache)

guiwoda commented 8 years ago

I'll leave my tests here for further reference:

the ini setting realpath_cache_ttl should be set to -1 to disable it, using 0 means forever.
If you're using apache with prefork module, the realpath cache will be shared through requests. This means you cannot trust php to be clean on the next request. Restarting apache or lowering the max requests per child configuration can help, but will impact your deployment or your performance.

We've switched off the realpath cache and will follow @rlerdorf's advice on the apache module.

zendtech / ZendOptimizerPlus

Some question about #126 "Cache invalidation for scripts in symlinked folders" #211