nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
27.37k stars 4.07k forks source link

File upload freezes/delays/timeouts if there is a large count of image files. Due to Redis do all preview folder scanning. #7269

Open ArnisR opened 6 years ago

ArnisR commented 6 years ago

Steps to reproduce

  1. Enable previews and caching/locking (Redis). Put some 50k image files in different folders & different users.
  2. Create previews. (For each image file NC creates a folder in _appdata_xyzw/preview)
  3. Upload new files through NC web or any third-party WebDAV client.
  4. Upload process freezes shortly after beginning of upload. The more images (=preview folders) are total (across all users), the longer are freezes. Even errors due timeouts when preview folders counts reached 150k and/or several users are active.

Expected behaviour

There should be no such delays/all preview folder scanning/locking.

Actual behaviour

Every time when files are accessed (even at thumbnail size), Redis server starts a process and make a 7 commands for every folder under preview folder regardless of user or files in particular folder. When files are accessed through web only for viewing it is almost unnoticeable except if some preview/thumbnail isn't loaded until Redis is finished. But when new files are uploaded, Redis process freezes upload. I run Redis monitoring and save results to a file (redis-cli monitor > filename.log). At every access/upload Redis made 7 commands for each folder under preview like that:

1511216011.732781 [0 127.0.0.1:42534] "INCRBY" "97e0744ba0fb92ff41f34a80803a7a9a/lockfiles/592f193f4d372a748dc3d125a676d4f2" "1"
1511216011.732828 [0 127.0.0.1:42534] "EXPIRE" "97e0744ba0fb92ff41f34a80803a7a9a/lockfiles/592f193f4d372a748dc3d125a676d4f2" "3600"
1511216011.733305 [0 127.0.0.1:42534] "WATCH" "97e0744ba0fb92ff41f34a80803a7a9a/lockfiles/592f193f4d372a748dc3d125a676d4f2"
1511216011.733348 [0 127.0.0.1:42534] "GET" "97e0744ba0fb92ff41f34a80803a7a9a/lockfiles/592f193f4d372a748dc3d125a676d4f2"
1511216011.733393 [0 127.0.0.1:42534] "MULTI"
1511216011.733463 [0 127.0.0.1:42534] "DEL" "97e0744ba0fb92ff41f34a80803a7a9a/lockfiles/592f193f4d372a748dc3d125a676d4f2"
1511216011.733471 [0 127.0.0.1:42534] "EXEC"

If there is a 150k preview folders it takes up to 2 minutes on i3 CPU (more than a 1 M commands, 100 MB). These keys don't resides in Redis memory and every time files are accessed/uploaded, Redis starts the process again. Regardless of how many files/folders actual user have, Redis do it for all preview folders.

Server configuration detail

Operating system: Linux 4.4.0-101-generic #124-Ubuntu SMP Fri Nov 10 18:29:59 UTC 2017 x86_64

Webserver: Apache/2.4.18 (Ubuntu) (fpm-fcgi)

Database: mysql 10.0.31

PHP version: 7.0.22-0ubuntu0.16.04.1 Modules loaded: Core, date, libxml, openssl, pcre, zlib, filter, hash, Reflection, SPL, session, standard, cgi-fcgi, mysqlnd, PDO, xml, apcu, calendar, ctype, curl, dom, mbstring, fileinfo, ftp, gd, gettext, iconv, igbinary, imagick, imap, intl, json, ldap, exif, mcrypt, memcache, mysqli, pdo_mysql, pdo_sqlite, Phar, posix, pspell, readline, recode, redis, shmop, SimpleXML, soap, sockets, sqlite3, sysvmsg, sysvsem, sysvshm, tidy, tokenizer, wddx, xmlreader, xmlrpc, xmlwriter, xsl, zip, Zend OPcache

Nextcloud version: 12.0.3 - 12.0.3.3

Updated from an older Nextcloud/ownCloud or fresh install: fresh

Where did you install Nextcloud from:

Signing status Array
List of activated apps ``` Enabled: - activity: 2.5.2 - bruteforcesettings: 1.0.2 - comments: 1.2.0 - dav: 1.3.0 - federatedfilesharing: 1.2.0 - files: 1.7.2 - files_pdfviewer: 1.1.1 - files_sharing: 1.4.0 - files_texteditor: 2.4.1 - files_trashbin: 1.2.0 - files_versions: 1.5.0 - files_videoplayer: 1.1.0 - firstrunwizard: 2.1 - gallery: 17.0.0 - groupfolders: 1.1.0 - issuetemplate: 0.3.0 - logreader: 2.0.0 - lookup_server_connector: 1.0.0 - nextcloud_announcements: 1.1 - notifications: 2.0.0 - oauth2: 1.0.5 - password_policy: 1.2.2 - previewgenerator: 1.0.7 - provisioning_api: 1.2.0 - serverinfo: 1.2.0 - sharebymail: 1.2.0 - systemtags: 1.2.0 - theming: 1.3.0 - twofactor_backupcodes: 1.1.1 - updatenotification: 1.2.0 - workflowengine: 1.2.0 Disabled: - admin_audit - encryption - federation - files_external - registration - survey_client - user_external - user_ldap ```
Configuration (config/config.php) ``` { "instanceid": "ocvyoyj09sk8", "passwordsalt": "***REMOVED SENSITIVE VALUE***", "secret": "***REMOVED SENSITIVE VALUE***", "trusted_domains": [ "REMOVED" ], "datadirectory": "\/var\/www\/clients\/client0\/web2\/web\/data", "overwrite.cli.url": "https:\/\/REMOVED", "dbtype": "mysql", "version": "12.0.3.3", "installed": true, "memcache.local": "\\OC\\Memcache\\Redis", "filelocking.enabled": true, "memcache.locking": "\\OC\\Memcache\\Redis", "redis": { "host": "localhost", "port": 6379, "timeout": 0 }, "mail_from_address": "cloud", "mail_smtpmode": "php", "mail_smtpauthtype": "LOGIN", "mail_domain": "REMOVED", "skeletondirectory": "", "filesystem_check_changes": 1, "default_language": "lv", "loglevel": 1, "preview_max_x": 1500, "preview_max_y": 1000, "preview_max_scale_factor": 2, "maintenance": false, "dbname": "------", "dbhost": "127.0.0.1", "dbuser": "***REMOVED SENSITIVE VALUE***", "dbpassword": "***REMOVED SENSITIVE VALUE***", "trashbin_retention_obligation": "30, 30", } ```
MorrisJobke commented 6 years ago

@rullzer Is there a way to better handle large amounts of previews? Some background job thingy?

rullzer commented 6 years ago

I don't get why we would access the other folders at all to be honest... I'll see if I can reproduce this...

ArnisR commented 6 years ago

@rullzer Maybe it is because preview folder is common and accessible for all users? As I understand previous NC versions has individual preview location for each user. P.S. I can't guarantee that Redis checks/lock preview folders because keys are salted/encrypted and I don't see cleartext folder or file name. But Redis commandset count is comparable with all folder count and is less than total file count. So I hasn't other explanation.

rmufr commented 6 years ago

I reproduce the exact same issue as @ArnisR , on my home NC server [v12.0.5.3, stable channels updates since NC10], HW is a Raspberry PI3. Stack Nginx, MariaDB 10.1.23 , PHP7.0 Number of preview folders : ~20000. Almost all images/videos are under the same user. Upload / Sync freezes after the file is actually uploaded, but the client is not aware and runs into a time-out. After that the uploaded file stays locked, and this prevent further synchronizations from being successful.

I can see a php-fpm worker drawing 25% Cpu during several minutes after upload, or even just after reading a small image directory (~10 images) This is a major issue for scalabillity, and I see no reason for such a cpu drawin operation on current simple usage.

Note that when this issue happens, immediately following sync or uploads run smoothly. For example:

rmufr commented 6 years ago

This seems related/dup of #6459

rmufr commented 6 years ago

Hi @MorrisJobke , @rullzer, I cleaned all my previews to avoid clients time-outs, but this does not solve the issue. As previews gets regenerated, I can see this redis full scan of all previews on the critical path of uploads - and occurs also after each photo directory opening in the web-UI, after the page is displayed.

My humble opinion is that this is a bug - if of course you can reproduce it. This issues is a very annoying one as I use NC to mainly access my photos/videos collection.

The performance degradation is already noticeable on a RPI3 server : 15 seconds with ~2000 preview files (700 preview directories).

rmufr commented 6 years ago

Hi @MorrisJobke @rullzer @rayn0r, I have investigated a bit on my own, though I am not knowladgeable in php language. What I think the issue comes from is :

I traced php slow queries (NC 13.0.0), and found the scan triggered in this code: scan() /var/www/nextcloud/lib/private/Files/View.php:1327 within the block

// if the file is not in the cache or needs to be updated, trigger the scanner and reload the data
 if (!$data || $data['size'] === -1) { ...

Here's the stack dump when syncing a file from windows (client waits for the scan to finish) :

_script_filename = /var/www/nextcloud/remote.php [0x74617700] exec() /var/www/nextcloud/lib/private/LargeFileHelper.php:199 [0x74617650] exec() /var/www/nextcloud/lib/private/LargeFileHelper.php:191 [0x746175e0] getFileMtime() /var/www/nextcloud/lib/private/Files/Storage/Local.php:189 [0x74617560] filemtime() /var/www/nextcloud/lib/private/Files/Storage/Common.php:670 [0x746174d0] getMetaData() /var/www/nextcloud/lib/private/Files/Storage/Wrapper/Wrapper.php:581 [0x74617480] getMetaData() /var/www/nextcloud/lib/private/Files/Cache/Scanner.php:112 [0x74617410] getData() /var/www/nextcloud/lib/private/Files/Cache/Scanner.php:150 [0x746172f0] scanFile() /var/www/nextcloud/lib/private/Files/Cache/Scanner.php:435 [0x74617160] handleChildren() /var/www/nextcloud/lib/private/Files/Cache/Scanner.php:403 [0x74617060] scanChildren() /var/www/nextcloud/lib/private/Files/Cache/Scanner.php:337 [0x74616fa0] scan() /var/www/nextcloud/lib/private/Files/View.php:1327 [0x74616ed0] getCacheEntry() /var/www/nextcloud/lib/private/Files/View.php:1371 [0x74616dd0] getFileInfo() /var/www/nextcloud/lib/private/Files/Node/Root.php:198 [0x74616d50] get() /var/www/nextcloud/lib/private/Files/Node/Folder.php:133 [0x74616d00] get() /var/www/nextcloud/lib/private/Files/AppData/AppData.php:89 [0x74616c60] getAppDataFolder() /var/www/nextcloud/lib/private/Files/AppData/AppData.php:105 [0x74616bf0] getFolder() /var/www/nextcloud/lib/private/Preview/Watcher.php:61 [0x74616b70] postWrite() /var/www/nextcloud/lib/private/Preview/WatcherConnector.php:60 [0x74616b20] OC\Preview{closure}() /var/www/nextcloud/lib/private/Hooks/EmitterTrait.php:99 [0x74616ad0] call_user_funcarray() /var/www/nextcloud/lib/private/Hooks/EmitterTrait.php:99

Here's the stack dump when the scan occurs after opening a directory with previews within the Web-UI

_scriptfilename = /var/www/nextcloud/index.php [0x74614530] exec() /var/www/nextcloud/lib/private/LargeFileHelper.php:199 [0x74614480] exec() /var/www/nextcloud/lib/private/LargeFileHelper.php:191 [0x74614410] getFileMtime() /var/www/nextcloud/lib/private/Files/Storage/Local.php:189 [0x74614390] filemtime() /var/www/nextcloud/lib/private/Files/Storage/Common.php:670 [0x74614300] getMetaData() /var/www/nextcloud/lib/private/Files/Storage/Wrapper/Wrapper.php:581 [0x746142b0] getMetaData() /var/www/nextcloud/lib/private/Files/Cache/Scanner.php:112 [0x74614240] getData() /var/www/nextcloud/lib/private/Files/Cache/Scanner.php:150 [0x74614120] scanFile() /var/www/nextcloud/lib/private/Files/Cache/Scanner.php:435 [0x74613f90] handleChildren() /var/www/nextcloud/lib/private/Files/Cache/Scanner.php:403 [0x74613e90] scanChildren() /var/www/nextcloud/lib/private/Files/Cache/Scanner.php:337 [0x74613dd0] scan() /var/www/nextcloud/lib/private/Files/View.php:1327 [0x74613d00] getCacheEntry() /var/www/nextcloud/lib/private/Files/View.php:1371 [0x74613c00] getFileInfo() /var/www/nextcloud/lib/private/Files/Node/Root.php:198 [0x74613b80] get() /var/www/nextcloud/lib/private/Files/Node/Folder.php:133 [0x74613b30] get() /var/www/nextcloud/lib/private/Files/AppData/AppData.php:89 [0x74613a90] getAppDataFolder() /var/www/nextcloud/lib/private/Files/AppData/AppData.php:105 [0x74613a20] getFolder() /var/www/nextcloud/lib/private/Preview/Generator.php:403 [0x746139a0] getPreviewFolder() /var/www/nextcloud/lib/private/Preview/Generator.php:109 [0x74613880] getPreview() /var/www/nextcloud/lib/private/PreviewManager.php:202 [0x746137b0] getPreview() /var/www/nextcloud/core/Controller/PreviewController.php:114

rmufr commented 6 years ago

Hi dear Nextclouders In case you struggle to reproduce this issue, here's my config details :

Operating system: Raspbian stretch on RPI3 Web server: : Nginx 1.13.3 Database: MariaDB 10.1.23 PHP version: 7.0.27 Nextcloud version: 13.0.0 Updated from an older Nextcloud/ownCloud or fresh install: updated each stable version since NC10 Where did you install Nextcloud from: official website download link, followed C. Rieger installation guide Signing status: No errors have been found. List of activated apps: Enabled:

Nextcloud configuration: { "system": { "instanceid": "REMOVED SENSITIVE VALUE", "passwordsalt": "REMOVED SENSITIVE VALUE", "secret": "REMOVED SENSITIVE VALUE", "trusteddomains": [ "REMOVED SENSITIVE VALUE", "REMOVED SENSITIVE VALUE" ], "datadirectory": "REMOVED SENSITIVE VALUE", "overwrite.cli.url": "REMOVED SENSITIVE VALUE", "dbtype": "mysql", "version": "13.0.0.14", "dbname": "REMOVED SENSITIVE VALUE", "dbhost": "REMOVED SENSITIVE VALUE", "dbport": "", "dbtableprefix": "oc", "dbuser": "REMOVED SENSITIVE VALUE", "dbpassword": "REMOVED SENSITIVE VALUE", "logtimezone": "Europe\/Paris", "filesystem_check_changes": 1, "installed": true, "memcache.local": "\OC\Memcache\APCu", "filelocking.enabled": "true", "memcache.locking": "\OC\Memcache\Redis", "redis": { "host": "REMOVED SENSITIVE VALUE", "port": 0, "timeout": 0 }, "mail_from_address": "REMOVED SENSITIVE VALUE", "mail_smtpmode": "smtp", "loglevel": "0", "mail_smtpauthtype": "LOGIN", "mail_smtpauth": 1, "mail_smtpport": "465", "mail_smtpname": "REMOVED SENSITIVE VALUE", "mail_smtppassword": "REMOVED SENSITIVE VALUE", "mail_smtpsecure": "ssl", "appstore.experimental.enabled": false, "theme": "", "maintenance": false, "enable_previews": true, "enabledPreviewProviders": [ "OC\Preview\PNG", "OC\Preview\JPEG", "OC\Preview\GIF", "OC\Preview\BMP", "OC\Preview\XBitmap", "OC\Preview\MarkDown", "OC\Preview\MP3", "OC\Preview\TXT", "OC\Preview\Movie" ], "updater.release.channel": "stable", "mail_domain": "REMOVED SENSITIVE VALUE", "mail_smtphost": "REMOVED SENSITIVE VALUE", "trashbin_retention_obligation": "30, 60" } } Are you using external storage, if yes which one: no Are you using encryption: no Are you using an external user-backend, if yes which one: no

rmufr commented 6 years ago

Hi, I think we can avoid this issue by doing the following : 1. change the "filesystem_check_changes" parameter in the Nextcloud configuration to "0"

2. rescan the appdata folder

This seems to work for me , at the cost of manually rescanning the file storage whenever I make manual changes to it. Let me know.

hoellen commented 6 years ago

I have the same issues like @ArnisR and @rmufr. I have >23k folders in the preview directory and it is locking/freezing the upload process. ~Is there a need to do things in the preview folder during an upload?~

edit: I can confirm that after running occ files:scan-app-data the upload performance improved a lot (i had ~1k entries with size = -1 from preview path in oc_filecache which are now gone)

michag86 commented 6 years ago

Same for me. Problem fixed for the moment after running occ files:scan-app-data :

Scanning AppData for files

+---------+-------+--------------+
| Folders | Files | Elapsed time |
+---------+-------+--------------+
| 21081   | 60204 | 00:03:46     |
+---------+-------+--------------+

237 entries with size -1 with path appData* where removed from oc_filecache

MorrisJobke commented 6 years ago

@rullzer Any idea why the app data is not fully scanned?

BloodyIron commented 5 years ago

I was having this issue as of today, however the steps outlined by @rmufr for disabling filesystem check and re-scanning for app data worked.

However, the documents are backed by a NAS and being served through SMB. So I'm wary to keep that value off.

How often am I realistically going to have to do this, and what is the comprehensive solution to this?

This issue didn't come up until I started running pre-generation of images via cron.

Stefan300381 commented 4 years ago

I'm lucky I found this discussion here as I faced the same issue with NC17. occ files:scan-app-data followed by termination and upload re-try solved the issue

magikmw commented 4 years ago

@rmufr 's workaround has also worked for me - I can now upload files bigger than 3,5GB without issues, and the server stopped having issues with my biggest image folder (15GB in 6k files), which would cancel my desktop sync unless excluded from synchronization.

@BloodyIron It's been a while, but maybe an interim solution would be to rescan files on a timer like cron instead of on-demand. Imperfect, but maybe sufficient.

solracsf commented 3 years ago

This still an issue today (v21.0.3). And the "solution" (occ files:scan-app-data) stills the same.