nextcloud / documentation

πŸ“˜ Nextcloud documentation
https://docs.nextcloud.com
Other
484 stars 1.67k forks source link

OPcache recommendation wrong (opcache.enable_cli=1) #1439

Closed tessus closed 5 years ago

tessus commented 5 years ago

I've been meaning to open an issue for this for a while now, but never got around to it.

The recommendation to set opcache.enable_cli=1 is actually bad for performance. I am not sure who came up with this, but I reckon that whoever it was did not understand what it meant.

Enabling OPcache for CLI has the following effect: it will optimize the CLI script (which takes of course more time the first time to do). However, in the case of a CLI script, it is always the first time for every single invocation of the script. The cache only lives until the process terminates, so as soon as the script ends the optimizations are gone. The only effect it had was that the script took longer to run, because optimization were done during runtime. You do not have to take my word for it. You can read up on this topic in several articles or contact the PHP developers. (Full disclosure: There are certain exceptions where perf can be improved for CLI scripts, but please note that those do not come into play for occ or any CLI script that is run by nextcloud.)

Not only the documentation is wrong here, also the check in settings -> admin -> overview. It is rather annoying to get a warning every time, even though the settings are actually better for performance. But I guess that's a separate ticket, which I will happily open as soon as we have established that my concerns are well-founded.

skjnldsv commented 5 years ago

@nickvergessen @rullzer @MorrisJobke ?

MorrisJobke commented 5 years ago

@tessus Thanks for this. Makes total sense. We should change it.

MorrisJobke commented 5 years ago

Fix is in https://github.com/nextcloud/server/pull/15468

tessus commented 5 years ago

@MorrisJobke Thanks. I just saw that you have already fixed it in the server check. I would have opened another issue in the server repo for this. I originally opened this issue for the documentation: https://docs.nextcloud.com/server/15/admin_manual/installation/server_tuning.html#enable-php-opcache

But I am more than happy that it was fixed in the server. Cheers.

MichaIng commented 5 years ago

@tessus Many thanks for this, I was thinking the same for a while.

Another question, I hope it's okay I ask it here, is the recommended OPcache size value:

MorrisJobke commented 5 years ago
  • At best the check should simply test if all Nextcloud scripts are indeed in OPcache or not, but not sure how/if that is easily/quickly possible.

I couldn't find that one when I worked on it.

  • So the default max size of 64M should be totally sufficient. The only real minimum recommendation to me should be 32M, everything above highly depends on possible other webserver usage and should not/cannot be estimated.

I actually changed it to 128MB, because I had an instance with some apps enabled that where quite have and reached over the 64 MB.

Does that make sense to you?

MichaIng commented 5 years ago

@MorrisJobke Hmm, okay in this case it makes sense. I never reached more than 33M AFAIK (monitored with https://github.com/rlerdorf/opcache-status/blob/master/opcache.php) with quite many apps enabled.

I am not sure how this OPcache size is handled. Usually a cache should be not much larger than the amount that is actually used, since larger cache reduces performance for the individual request usually. So if there is any way to have the recommended cache size dependant on the actual amount/size of PHP files of the instance, this would be beneficial. In case it is also a question how much the performance decreases if single scripts are not cached due to max size reached.

tessus commented 5 years ago

In case it is also a question how much the performance decreases if single scripts are not cached due to max size reached.

This is not how a cache works. In that case eviction occurs (by some alogorithm like LRU, LFU, FIFO, ...) and the new csript will be optimized and added to the cache. Thus the perf hit only amounts to removing data from memory. That's it.

MichaIng commented 5 years ago

@tessus Makes sense. However I know from MariaDB/MySQL caches that larger sizes lead to decreased performance as long as everything stays in cache, since a larger size needs to be handled/searched through for each access. But as said, not sure if this is true for OPcache as well and how this compares to the performance decrease when regularly data needs to be removed due to too low size.

Btw. couldn't some method from opcache.php be used to check currently used OPcache size and file count and recommending the next power of two in admin panel in case max size is full or close to? EDIT: https://www.php.net/manual/en/function.opcache-get-status.php


Perhaps I should open a new issue about this, also based on some new evaluation how much cache is used when installing large amounts of apps.

tessus commented 5 years ago

However I know from MariaDB/MySQL caches that larger sizes lead to decreased performance as long as everything stays in cache, since a larger size needs to be handled/searched through for each access.

Hmm, I worked in DB2 development for many years and data is always manipulated or read from bufferpools (except large objects). This means that data has first to be fetched into the bufferpool to be used. This also means the larger the bufferpool, the better it is for performance. I haven't looked at the source code for MySQL (I wasn't allowed to look at other code due to code contamination), but I doubt that database engines (for RDBMS) are fundamentally different. Therefore I do not understand why a larger cache would mean a performance hit. If this really is the case, something might be wrong with the engine code.

I don't see a bigger OPcache memory allocation to be detrimental to performance. e.g. the max_accelerated_files parameter is actually a prime number (you can check the docs), which means a hash function is used to access data. We all should know what that means for big-O. ;-)

MichaIng commented 5 years ago

@tessus I tried to find the sources I read some time ago about this. Actually I believe this was indeed limited to the query_cache_size which locks on every update: https://mariadb.com/kb/en/library/query-cache/#limiting-the-size-of-the-query-cache

But lets see how much OPcache is actually used by Nextcloud:

tessus commented 5 years ago

@MichaIng thx for the link.

Actually I believe this was indeed limited to the query_cache_size which locks on every update

I really would have to look at the source code, but something seems not quite right here. Anyway, I really do not want to go off-topic, thus you can always drop me an email to talk about database engine internals.

Not sure if all those scripts could be theoretically cached

Yes, if you tune the parameters accordingly. However, as you can see, you did not hit the limit of files. Your stats say 3764 scripts are in the cache. If you set the max_accelerated_files to 10000, the maximum number of scripts would be 16229.

MichaIng commented 5 years ago

@tessus

Yes, if you tune the parameters accordingly.

Ah sorry, I wanted to ask if theoretically all 36551 found .php files (~260M size) would be cached, if limits were raised. Or are there some that are never cached, e.g. because they are never called from webserver? At least the ones that are used by occ and cron.php I guess and for config.php (+other settings files) it would make sense to block them from caching.

However from the actual OPcache usage it can be taken as result that max_accelerated_files=10000 and opcache.memory_consumption=128 is very failsafe and halved values would work for the vast majority or even all possible Nextcloud setups.

tessus commented 5 years ago

@MichaIng you are on the correct path already. files which are not served by php-fpm or the php module will never be in the cache. (As mentioned before, even if you were to use opcache for scripts, the cache would only live for the lifetime of the process.)

You can always use blacklists for settings files. The suggested value of 1 for revalidate_freq renders blacklists pretty much useless. I use actually the value 60 and I don't use a blacklist. I don't change the config file every few minutes, so I can wait 60s for the changes to be picked up. ;-)

MorrisJobke commented 5 years ago

You can always use blacklists for settings files. The suggested value of 1 for revalidate_freq renders blacklists pretty much useless. I use actually the value 60 and I don't use a blacklist. I don't change the config file every few minutes, so I can wait 60s for the changes to be picked up. ;-)

We have this in place to have a somewhat okayish caching (for bigger installations this still caches all the files of one second). On the other side we do sometimes changes to the config from the CLI and as CLI and Webserver use different caches the admin needs to be aware of either a) waiting 60 seconds or b) purging the cache of the webserver after one of these changes. Both options are unfortunately hard to teach to 10s of thousands of administrators out there. :/ I know that it is not a nice and perfect solution but it is at least a quite good one. Sadly we need to go sometimes a way of compromises. If we would have full control we would only invalidate the cache after an upgrade, app update or config cache, but unfortunately that is not easy with CLI and web server administration in one product (and we are not even taking about multiple web servers that have then similar things to respect).

MorrisJobke commented 5 years ago

Thanks nevertheless for the deep dive into the mechanics of caching and how to improve here. We are happy to make this better when the scenarios we are serving under are fully supported.

MichaIng commented 5 years ago

Just found that the recommended (by Nextcloud) opcache.memory_consumption, opcache.interned_strings_buffer and opcache.max_accelerated_files are the default values since PHP7.0: https://php.net/manual/en/opcache.configuration.php Even that they are as above more than double the size of what Nextcloud can actually use, there is probably not much point in recommending lower than default values.

However I still think it would be beneficial to use opcache_get_status() to check for actual usage before recommending (max size + files) values. This would also cover cases where other websites with large code base are active beside Nextcloud, so that the currently recommended values might be too low. So check current file amount and size usage, calculate a percentage and if this is above e.g. 80% recommend the raise the related value. There is even an interned_strings_usage key in the function return to do the same for opcache.interned_strings_buffer.

MorrisJobke commented 5 years ago

However I still think it would be beneficial to use opcache_get_status() to check for actual usage before recommending (max size + files) values. This would also cover cases where other websites with large code base are active beside Nextcloud, so that the currently recommended values might be too low. So check current file amount and size usage, calculate a percentage and if this is above e.g. 80% recommend the raise the related value. There is even an interned_strings_usage key in the function return to do the same for opcache.interned_strings_buffer.

Mind to open a ticket in the server repo with this?

MichaIng commented 5 years ago

Makes sense, done: https://github.com/nextcloud/server/issues/15522

tessus commented 5 years ago

@MorrisJobke

I know that it is not a nice and perfect solution but it is at least a quite good one. Sadly we need to go sometimes a way of compromises

I agree that recommendations should be made which are best for nextcloud. And when it comes to performance I totally agree that these recommendations should be followed and even a warning given in the admin settings. I'm glad though that you decided not to emit a warning when revalidate_freq is not 1. This would certainly confuse (and most likely annoy) a lot of admins. IMO this parameter is a personal choice and nothing more. Such things should never be used in a sanity check. As I said, I'm glad that it isn't.

MorrisJobke commented 5 years ago

I'm glad though that you decided not to emit a warning when revalidate_freq is not 1. This would certainly confuse (and most likely annoy) a lot of admins. IMO this parameter is a personal choice and nothing more. Such things should never be used in a sanity check. As I said, I'm glad that it isn't.

Yep - was also my idea back then. If the admin knows what it does then we are fine with a different value.

J0WI commented 5 years ago

There is also a recommendation for apc.enable_cli = 1 at https://github.com/nextcloud/documentation/blob/dcd10b19cc705c6343e25b96d1adbbc49ac46a33/admin_manual/configuration_server/caching_configuration.rst#apcu

Should this be removed too?

https://www.php.net/manual/en/apc.configuration.php#ini.apc.enable-cli

tessus commented 5 years ago

This is a big YES - definitely !!!!

The following is stated in the docs for this parameter:

Mostly for testing and debugging. Setting this enables APC for the CLI version of PHP. Under normal circumstances, it is not ideal to create, populate and destroy the APC cache on every CLI request, but for various test scenarios it is useful to be able to enable APC for the CLI version of PHP easily.

MichaIng commented 5 years ago

~Interesting, AFAIK it is explicitly tested if enabled for CLI. If disabled for CLI (but enabled otherwise) the Nextcloud admin panel shows an error.~ EDIT: Ah now I remember, it was an info log entry, will try to replicate it. EDIT2:

Info cli Memcache \OC\Memcache\APCu not available for distributed cache

But indeed it does not make sense on first thought.

tessus commented 4 years ago

So, this article is wrong? https://pierre-schmitz.com/using-opcache-to-speed-up-your-cli-scripts/

No, this article is correct. The article talks about opcache.file_cache, which is used to write the compiled data into a file, so should still be available after the script has ended. That parameter was never mentioned in the Nextcloud documentation though and by default it is not active.

One would have to run some perf tests to see if that would really help with speeding up the occ command.

-- regards Helmut K. C. Tessarek

(sent from a mobile device)

MichaIng commented 4 years ago

Sounds promising, although one has to be aware that this means doubled RAM usage (if cache is created in tmpfs) and most likely doubled initial cache creation effort, since both, SHM and file cache needs to be created.

It probably makes more sense when used with opcache.file_cache_only: https://www.php.net/manual/de/opcache.configuration.php#ini.opcache.file-cache So no SHM cache is build up, but the file cache only, which survives scripts and processes until a reboot or cleanup job removes them. Only question is if the SHM cache is faster or not (when file cache is in tmpfs)?

spreeni151 commented 3 years ago

What about this warning? It's still in the official manual (https://github.com/nextcloud/documentation/blob/dcd10b19cc705c6343e25b96d1adbbc49ac46a33/admin_manual/configuration_server/caching_configuration.rst#memory-caching) Warning APCu is disabled by default on CLI which could cause issues with nextcloud's cron jobs. Please make sure you set the apc.enable_cli to 1 on your php.ini config file. It sounds like users get trouble with cron job if option is off ???

MichaIng commented 3 years ago

To evaluate this it would be important to know how APCu cache is actually used in Nextcloud. Some background jobs and CLI commands are quite time consuming compared to the cache init and destruction that happens in a fraction of a second, like all those file scanning, preview generation, database scan/migration tasks. If there is any repeatedly (during same job execution) used data written to this cache, which would otherwise be written and read from disk, then APCu would be useful, otherwise not.

So far I see apps info.xml and app icons SVG files inside, and some internal user-specific routes, so that doesn't look like being relevant for mentioned CLI commands.

spreeni151 commented 3 years ago

Thank you @Michaelng , so I better leave this option off. It would be nice if documentation would be more precise in that point. But I understand that it is difficult to keep everything up to date...

jospoortvliet commented 3 years ago

We now have bumped into https://github.com/nextcloud/server/issues/25742 which turns this recommendation around - if you do NOT have an opcache for the CLI, your server is likely to crash 😒

So I'm guessing these PR's should be reverted?!?!

MichaIng commented 3 years ago

The issue is not OPcache but APCu, which is discussed above as well but the warning and docs hint were never removed.

So nothing needs to be reverted but probably the APCu warning more pronounced. AFAIK it was in the logs only but not part of the admin panel checks? I'll have a look.

EDIT: Nothing has changed about APCu:

tessus commented 3 years ago

So I'm guessing these PR's should be reverted?!?!

No. Please don't. Think about what you are saying here. I don't know how to say this without coming across as an a-hole, but use common sense. You cannot seriously think that not using an optional feature is the culprit. This is just crazy.

MichaIng commented 3 years ago

@tessus Calm down, it was just a misunderstanding about which cache type currently in fact need to be enabled for CLI πŸ˜‰ ✌️: https://github.com/nextcloud/server/issues/25742

tessus commented 3 years ago

I came here from nextcloud/server#25770 and the reasoning is - for the lack of a better term - insane. I am perfectly calm.

This will become the laughingstock of the decade. A bunch of people just rewrote how caching should work, because they don't understand the topic.

Nobody in their right mind can tell me that this is a logical approach. The documentation should state that from now on Nextcloud users must use 2 separate php ini files. It is still idiotic, but at least in that case, people will just deactivate APCu on cli completely.

Update: IMO the root cause must be fixed. This will be one of these things that will never work properly. Another example: File locking has been broken since the beginning. So what happened is that people just accepted the fact that file locking in Nextcloud only works when using Redis. Can someone explain to me why such an essential thing has been ignored for many, many years?

MichaIng commented 3 years ago

I'm also not happy with that PR, respectively the underlying issue that parts of (for a long time) Nextcloud and now the core (not sure which part, but I could replicate the issue on a fresh instance) do rely on APCu for CLI calls, instead of only using it, when it's available and have a fallback to manual SHM//tmp whatever to write volatile data to.

But let's not discuss this here, as this only raises confusion and mix of OPcache and APCu/memcache, which, as you stated perfectly well over there, is two completely different things πŸ™‚.

tessus commented 3 years ago

The problem is that we not only need 2 php.ini files, but also 2 Nextcloud config.php files. Because you can't just deactivate apcu in the php ini file, when the config.php has apcu configured.

Basically what's happening here is that if you want to use APCu at all, you have to take a performance hit and go against all advice from PHP developers and performance experts.

I still can't fathom that this doesn't strike any of the devs as strange.

jospoortvliet commented 3 years ago

Update: IMO the root cause must be fixed. This will be one of these things that will never work properly. Another example: File locking has been broken since the beginning. So what happened is that people just accepted the fact that file locking in Nextcloud only works when using Redis. Can someone explain to me why such an essential thing has been ignored for many, many years?

Well, I don't know about any bugs there, but the cause is pretty simple - if you want to support doing a file locking in two ways (database and redis) you have twice as much chance for bugs as you get two code paths. And one is then going to be the most used and tested one, and the other will be less used and less tested and thus less stable. Technically, the sane solution is perhaps to just drop support for running Nextcloud without Redis, but for some home users that's a huge pain so we try to support no-Redis situations - as volunteers. But for customers, running Nextcloud without Redis is simply not supported, I think.

If you're so upset about it - take it as motivation to fix it... 'cuz like with all features only relevant for home users, either somebody volunteers their time, or it won't get fixed.

te-online commented 2 years ago

Sorry to jump in here, but it's an interesting discussion πŸ€“

@tessus Can you shed some light on why an admin would choose revalidate_freq=60? I'm trying to understand the benefits of this value. For me it does not seem to improve performance and had the added drawback that I have to wait 60 seconds after app updates, because frontend (JS) and backend (PHP) can get out of sync.

tessus commented 2 years ago

@te-online This all depends on the app, how often the same file is accessed and often changes happen to PHP source file. Do you change a PHP file every few seconds? If so, why?

Using a value of 60s is for production and not for a development system. When you develop an application you either

There is also the option of manually invlidating a cached script.

During revalidation the following happens: A checksum is created for the file that is accessed. Then this checksum is compared to the checksum in the cache. If the checksums match, the cache is used, otherwise the cache if invalidated and the source file is compiled and put in the cache.

So, if you don't change the files every few seconds, this overhead can become noticeable. I say "can", because it also depends on the load of your system and your HW specs. e.g. is your system CPU bound...

For prod systems a value 60 should be fine. If that's not the case, there are still the options of not caching certain files or invalidating them manually.

te-online commented 2 years ago

@tessus Thank you so much for your detailed explanation. It makes more sense to me now.

I think that maybe something is missing from Nextcloud's invalidation, because technically it should be possible to flag the stale files after app updates, right? But that's a different issue, I guess... πŸ˜…

tessus commented 2 years ago

@te-online You are welcome.

it should be possible to flag the stale files after app updates, right?

Right, but in such a case waiting 60 seconds is also not really a problem. How often does that happen anyway? Or you could just restart the fpm process. That's what I do, if there are many changes at once. I've setup a separate php-fpm master process for Nextcloud.