nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
26.67k stars 4k forks source link

[Bug]: opcache.max_accelerated_files never enough #33224

Closed vasyugan closed 1 year ago

vasyugan commented 2 years ago

⚠️ This issue respects the following points: ⚠️

Bug description

Since the upgrade to 24, Nextcloud is never happy with the value of opcache.max_accelerated_files and always wants me to increase it. I started with 10,000, then 20,000, then 40,000, then 80,000 and each time, after a short while, I saw the same message:

There are some warnings regarding your setup.

The PHP OPcache module is not properly configured. See the [documentation ↗](https://docs.nextcloud.com/server/24/go.php?to=admin-php-opcache) for more information.
    The maximum number of OPcache keys is nearly exceeded. To assure that all scripts can be hold in cache, it is recommended to apply opcache.max_accelerated_files to your PHP configuration with a value higher than 80000.

I suppose this will happen yet again even when I double the number to 160,000

Steps to reproduce

  1. Upgrade to nextcloud 24
  2. Wait for the check in Settings/Overview to be completed
  3. Double the value of opcache.max_accelerated_files, reload php-fpm
  4. Wait a while and revisit the overview page.

Expected behavior

When the value is double the recommended minimum, Nextcloud should be fine with it.

Installation method

Manual installation

Operating system

Debian/Ubuntu

PHP engine version

PHP 7.4

Web server

Nginx

Database engine version

MariaDB

Is this bug present after an update or on a fresh install?

Updated to a major version (ex. 22.2.3 to 23.0.1)

Are you using the Nextcloud Server Encryption module?

Encryption is Disabled

What user-backends are you using?

Configuration report

{
    "system": {
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "cron.lockfile.location": "\/var\/www\/chrooted\/***web\/data\/cloud.***.org\/",
        "dbtype": "mysql",
        "version": "24.0.2.1",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "default_phone_region": "DE",
        "dbtableprefix": "oc_",
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "default_locale": "de_DE",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "installed": true,
        "forcessl": true,
        "maxZipInputSize": 838860800,
        "defaultapp": "apporder",
        "allowZipDownload": true,
        "mail_smtpmode": "smtp",
        "mail_smtphost": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpport": "587",
        "mail_smtptimeout": 10,
        "mail_smtpauthtype": "PLAIN",
        "trusted_domains": [
            "cloud.***.org",

        ],
        "memcache.local": "\\OC\\Memcache\\Redis",
        "redis": {
            "host": "***REMOVED SENSITIVE VALUE***",
            "port": 6379
        },
        "activity_use_cached_mountpoints": "true",
        "loglevel": 0,
        "secret": "***REMOVED SENSITIVE VALUE***",
        "forceSSLforSubdomains": false,
        "appstore.experimental.enabled": true,
        "trashbin_retention_obligation": "auto",
        "updater.release.channel": "stable",
        "overwrite.cli.url": "https:\/\/cloud.***.org",
        "default_language": "de",
        "mail_domain": "***REMOVED SENSITIVE VALUE***",
        "mail_from_address": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpauth": 1,
        "mail_smtpname": "***REMOVED SENSITIVE VALUE***",
        "mail_smtppassword": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpsecure": "tls",
        "mysql.utf8mb4": true,
        "has_rebuilt_cache": true,
        "twofactor_enforced": "true",
        "twofactor_enforced_groups": [
            "admin"
        ],
        "twofactor_enforced_excluded_groups": [],
        "theme": "",
        "mail_sendmailmode": "smtp",
        "app_install_overwrite": {
            "1": "twofactor_gateway",
            "2": "apporder",
            "3": "integration_whiteboard"
        },
        "trusted_proxies": "***REMOVED SENSITIVE VALUE***"
    }
}

List of activated Apps

Enabled:
  - accessibility: 1.10.0
  - activity: 2.16.0
  - admin_audit: 1.14.0
  - apporder: 0.15.0
  - bruteforcesettings: 2.4.0
  - calendar: 3.4.2
  - calendar_resource_management: 0.1.1-alpha.1
  - circles: 24.0.0
  - cloud_federation_api: 1.7.0
  - comments: 1.14.0
  - contacts: 4.1.1
  - contactsinteraction: 1.5.0
  - dashboard: 7.4.0
  - dav: 1.22.0
  - deck: 1.7.1
  - federatedfilesharing: 1.14.0
  - federation: 1.14.0
  - files: 1.19.0
  - files_external: 1.16.1
  - files_fulltextsearch: 24.0.1
  - files_lock: 24.0.1
  - files_markdown: 2.3.6
  - files_pdfviewer: 2.5.0
  - files_rightclick: 1.3.0
  - files_sharing: 1.16.2
  - files_trashbin: 1.14.0
  - files_versions: 1.17.0
  - files_videoplayer: 1.13.0
  - firstrunwizard: 2.13.0
  - forms: 2.5.1
  - fulltextsearch: 24.0.0
  - fulltextsearch_elasticsearch: 24.0.1
  - groupfolders: 12.0.1
  - impersonate: 1.11.0
  - integration_mastodon: 1.0.2
  - integration_reddit: 1.0.4
  - integration_twitter: 1.0.2
  - integration_whiteboard: 0.0.14
  - integration_zammad: 1.0.3
  - keeweb: 0.6.9
  - logreader: 2.9.0
  - lookup_server_connector: 1.12.0
  - mail: 1.13.6
  - maps: 0.1.10
  - news: 18.1.0
  - nextcloud_announcements: 1.13.0
  - notes: 4.4.0
  - notifications: 2.12.0
  - notify_push: 0.4.0
  - oauth2: 1.12.0
  - password_policy: 1.14.0
  - photos: 1.6.0
  - polls: 3.7.0
  - privacy: 1.8.0
  - provisioning_api: 1.14.0
  - qownnotesapi: 22.5.0
  - quota_warning: 1.14.0
  - recommendations: 1.3.0
  - richdocuments: 6.1.1
  - riotchat: 0.13.1
  - serverinfo: 1.14.0
  - settings: 1.6.0
  - sharebymail: 1.14.0
  - spreed: 14.0.3
  - support: 1.7.0
  - survey_client: 1.12.0
  - suspicious_login: 4.2.0
  - systemtags: 1.14.0
  - tasks: 0.14.4
  - text: 3.5.1
  - theming: 1.15.0
  - twofactor_backupcodes: 1.13.0
  - twofactor_email: 2.5.0
  - twofactor_gateway: 0.20.0
  - twofactor_nextcloud_notification: 3.4.0
  - twofactor_totp: 6.4.0
  - updatenotification: 1.14.0
  - user_retention: 1.7.0
  - user_status: 1.4.0
  - user_usage_report: 1.8.0
  - viewer: 1.8.0
  - weather_status: 1.4.0
  - workflowengine: 2.6.0
Disabled:
  - encryption: 1.0.0
  - integration_github: 1.0.2
  - integration_gitlab: 1.0.3
  - piwik: 0.10.0
  - twofactor_admin: 3.2.0
  - user_ldap

Nextcloud Signing status

No errors have been found.

Nextcloud Logs

File is 47 MB So, definitely too big for pasting.

Additional info

No response

kesselb commented 2 years ago

cc @szaimen @MichaIng

MichaIng commented 2 years ago

Is Nextcloud the only application which runs on this PHP-FPM pool?

Can you use e.g. https://github.com/amnuts/opcache-gui to verify that all 80,000 stored keys are Nextcloud scripts? If do, do you use OPcache preloading or something like that?

My Nextcloud instance stores around 2,000 keys, so it is very unexpected to see auch a large amount.

kesselb commented 2 years ago

Is Nextcloud the only application which runs on this PHP-FPM pool?

PHP-FPM process. Afaik opcache is shared between the pools when they belong to the same process.

vasyugan commented 2 years ago

Is Nextcloud the only application which runs on this PHP-FPM pool?

There are two nextcloud instances and one instance of the time management system kimai.

OPcache gui doesn't distinguish between pools, unfortunately (unless I overlooked something). So I don't have that information

MichaIng commented 2 years ago

PHP-FPM process. Afaik opcache is shared between the pools when they belong to the same process.

Is it even possible that pools belong to the same process? As every pool has it's one process management/limits, runtime user and socket file defined in the pool config file. Or do you mean the FPM master process? Would somehow contradict security/privacy ideas when users of one pool would have full insights and control of the cache of other pools 🤔.

OPcache gui doesn't distinguish between pools, unfortunately (unless I overlooked something). So I don't have that information

Of course the GUI has insights only into the single pool it is called from (if my above assumption is correct that pools do not share cache instances). If there were multiple pools, used by other webserver instances or vhosts, you'd need to place the GUI script into each of the related web roots and access them separately. However, if there is a single file in /etc/php/x.y/fpm/pool.d/ only, like /etc/php/7.4/fpm/pool.d/www.conf (default on Debian Bullseye), you have one pool only.

Also you can open the "Cached" tab of the GUI and browse/search/filter the cached files, to see where they are coming from. Also two Nextcloud instances shouldn't be able to crack the 10,000 anywhere close, so probably kimai uses a large number of scripts?

kesselb commented 2 years ago

the FPM master process?

:point_up:

vasyugan commented 1 year ago

Also you can open the "Cached" tab of the GUI and browse/search/filter the cached files, to see where they are coming from.

Followed your advice, they come from different pools. I don't know if this means that something is misconfigured here.

HLFH commented 1 year ago

Maybe what needs to be done is to:

  1. Install opcache-gui and setup basic auth to monitor safely the opcache
  2. Setup a separate nextcloud.conf pool file https://wiki.archlinux.org/title/Nextcloud#php-fpm. The file is here.

On my side, since I had this warning, I installed opcache-gui with Nginx HTTP basic auth, and then, I have reset the opcache cache (!), so I no longer have this bug, but as soon as I get this warning again, I will see if I can setup a separate nextcloud.conf pool without breaking the whole Nginx server and other running Nginx web domains/sites, which I have done by the past.

vasyugan commented 1 year ago

34291 looks similar.

vasyugan commented 1 year ago

Maybe what needs to be done is to:

1. Install [opcache-gui](https://github.com/amnuts/opcache-gui) and setup basic auth to monitor safely the opcache

2. Setup a separate nextcloud.conf pool file https://wiki.archlinux.org/title/Nextcloud#php-fpm. The file [is here](https://gist.githubusercontent.com/wolegis/0d9c83acd0c8bf83bcfb3983931bc364).

On my side, since I had this warning, I installed opcache-gui with Nginx HTTP basic auth, and then, I have reset the opcache cache (!), so I no longer have this bug, but as soon as I get this warning again, I will see if I can setup a separate nextcloud.conf pool without breaking the whole Nginx server and other running Nginx web domains/sites, which I have done by the past.

I have had separate PHP pools for ages already. opcache-gui doesn't seem to distinguish between pools.

MichaIng commented 1 year ago

It was a false assumption by me that dedicated FPM pools mean dedicated OPcache instances. OPcache is shared across the whole FPM master process.

szaimen commented 1 year ago

Should be resolved with 24.0.10 and 25.0.4

MichaIng commented 1 year ago

The recent patch was for opcache.interned_strings_buffer, not opcache.max_accelerated_files 😉. However, there never was something wrong with opcache.max_accelerated_files: Nextcloud simply shows the fact whether it is filled above 90% on this PHP server. And with multiple Nextcloud instances or other PHP applications it is pretty much possible that the default OPcache limits are exhausted.

vasyugan commented 1 year ago

The recent patch was for opcache.interned_strings_buffer, not opcache.max_accelerated_files wink. However, there never was something wrong with opcache.max_accelerated_files: Nextcloud simply shows the fact whether it is filled above 90% on this PHP server. And with multiple Nextcloud instances or other PHP applications it is pretty much possible that the default OPcache limits are exhausted.

Are you implying that there is no solution to this and we just have to live with it?

The point is that all my Nextcloud instances constantly advise me to increase the value of `opcache.max_accelerated_files, I have doubled it several times until I arrived at something like 20,000 and my server at the same time became unstable. I don't know if that was really the cause, but it is at least possible. Each doubling seems to have worked for just some days and then Nextcloud anew began complaining.

If there is a PHP setting that e.g. regularly flushes the cache when a certain threshold value is reached? If so, it would be great if this could be documented.

MichaIng commented 1 year ago

Are you implying that there is no solution to this and we just have to live with it?

I can only repeat myself: Nextcloud is just telling you a fact, so there is no issue to solve on Nextcloud end. You can:

If you believe that Nextcloud is calculating the usage somehow wrong (then there would be something to fix on Nextcloud end), you can verify it with above mentioned and linked opcache-gui. There you can also check which application the number of scripts is coming from. In my case, Nextcloud is using less than 2000 scripts/keys in OPcache, so 20.000 should be hardly possible by two Nextcloud instances only. I don't know kimai, probably it consists of such a large number of scripts. Simply type /var/www/whatever into the filter (replace with the path to the respective web application), and it immediately shows you the number of scripts cached from this path.

If there is a PHP setting that e.g. regularly flushes the cache when a certain threshold value is reached?

That would act against the purpose of a cache: You want your scripts cached for your pages to load faster 🤔.

my server at the same time became unstable

The only connection would be RAM usage: Check the amount of free RAM on your server, and check whether services have been OOM killed. If not, then the OPcache has no effect on your server stability. If RAM usage is at its limit, then indeed you need to find a good balance between e.g. OPcache size, number of PHP workers, number of webserver workers, number of database workers, database cache, etc, depending on where bottlenecks are. The OPcache memory usage can be indirectly limited with opcache.max_accelerated_files, but it makes more sense to limit it explicitly with opcache.memory_consumption, defining the exact maximum amount of memory used by the whole thing.

kesselb commented 1 year ago

I believe a pull request to add "upper" limits is something we could accept.

MichaIng commented 1 year ago

Then we'd be back at arbitrariness. Which upper limit shall we set? Values expected for a single Nextcloud instance? The PHP defaults are about doubled as large as needed for a single Nextcloud instance, so then we could remove the check all together. If we want to give admins something at hand, we need to tell when the cache is about to be exhausted, regardless how much applications are running on the same PHP instance (and hence how large the cache might be).

When running several PHP applications on the same server, it is expected that those values need to be raised, like it is for every other cache/buffer, like the database caches (where we just have no ability gather information about). Probably we need to make that more clear somewhere, not sure.

Performance is a major topic when people are discussion and comparing Nextcloud to competitors, as its web interface is quite huge with a large number of frameworks loaded, also redundant ones for supporting legacy APIs, browsers etc, and hence is relatively slow, even for a PHP application. Especially for large instances with many users, we do us a favour to not let admins unrecognised sacrifice a major performance benefit for their users by not or only partly using the OPcache.

But we could make the heading of the check's section softer, to make clear that those are hints one may have a look at, not generally mandatory to solve. In case of real security and other high severity issues, this can be made clear within the individual message.

vasyugan commented 6 months ago

I can only repeat myself: Nextcloud is just telling you a fact, so there is no issue to solve on Nextcloud end. You can:

* Increase `opcache.max_accelerated_files` to assure all PHP scripts can be cached.

Again, I did this several times over, always doubling the number. This gave me relief for only a few days. And this pattern didn't stop when I arrived at ridicously high values. So there seems to be some leak or so. Nextcloud's hunger seems insatiable.

MichaIng commented 6 months ago

Nextcloud has only a certain amount of PHP scripts, so there is a hard limit on how much it is able to raise the number of accelerated/cached files. You can filter by path in opcache-gui to see the amount of files/keys that one or the other Nextcloud instance has stored in cache, how much your time management system uses, and whether there are unexpected files in the cache. But I already told that.

As keys are stored as absolute system paths, you can indirectly filter indirectly by PHP pool this way. Not sure whether I mentioned it in this thread already, but you are right that OPcache does not distinguish between pools: All pools are stored in a single cache instance, which is possible since cache keys contain the full file path.