nextcloud / fulltextsearch

🔍 Core of the full-text search framework for Nextcloud
GNU Affero General Public License v3.0
209 stars 50 forks source link

Force Quit on large (~600mb) tif file #384

Closed tacruc closed 5 years ago

tacruc commented 5 years ago

Hi I got this Force Quit Message

Memory: 123 MB
┌─ Indexing  ────
│ Action: compareWithCurrentIndex
│ Provider: Files                Account: user
│ Document: -1
│ Info: httpd/unix-directory
│ Title:
Options: []
Memory: 126 MB
┌─ Indexing  ────
│ Action: indexDocument
│ Provider: Files                Account: user
│ Document: 4238682
│ Info: image/tiff
│ Title: 2017-2018 Nepal Indien/2017.11.27-12.01_Ladakh Leh/Rumbak Pass_PANO/prefix.tif
│ Content size: 0
│ Progress:  48001/60605
└──
┌─ Results ────
│ Result:      2/2
│ Index: bookmarks:1
│ Status: ok
│ Message: {"_index":"nextcloud","_type":"standard","_id":"bookmarks:1","_version":2,"result
│ ":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":20703,"_primary_term"
│ :1}
└──
┌─ Errors ────
│ Error:      0/0
│ Index: 
│ Exception: 
│ Message: 
│ 
│ 
└──
## x:first result ## c/v:prec/next result ## b:last result
## f:first error ## h/j:prec/next error ## d:delete error ## l:last error
## q:quit ## p:pause 
Force Quit

Allways on this large prefix.tif file. The password is the file name. I clound't finde any futher related error messages in the logs.

ArtificialOwl commented 5 years ago

This is huge file ! event for a TIFF.

Do you have other issue with file as big as this one ? what is your setup (are you using the files_fts_tesseract app ?)

tacruc commented 5 years ago

So here is the server detail, so I'm using files_fts_tesseract. Coulnd't test until now if it fails for an other tiff this size.

Server configuration detail

Operating system: Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64

Webserver: Apache/2.4.27 (Debian) (apache2handler)

Database: mysql 10.1.26

PHP version:

7.0.30-0+deb9u1 Modules loaded: Core, date, libxml, openssl, pcre, zlib, filter, hash, Reflection, SPL, session, standard, apache2handler, mysqlnd, PDO, xml, apcu, apc, bz2, calendar, ctype, curl, dom, enchant, mbstring, fileinfo, ftp, gd, gettext, gmp, gnupg, iconv, igbinary, imagick, imap, interbase, intl, json, ldap, exif, mcrypt, mongodb, mysqli, odbc, pdo_dblib, PDO_Firebird, pdo_mysql, PDO_ODBC, pdo_pgsql, pdo_sqlite, pgsql, Phar, posix, pspell, readline, recode, redis, shmop, SimpleXML, sockets, sqlite3, sysvmsg, sysvsem, sysvshm, tidy, tokenizer, wddx, xmlreader, xmlrpc, xmlwriter, xsl, zip, Zend OPcache

Nextcloud version: 14.0.0 - 14.0.0.19

Updated from an older Nextcloud/ownCloud or fresh install:

Where did you install Nextcloud from: unknown

Signing status
List of activated apps ``` Enabled: - accessibility: 1.0.1 - activity: 2.7.0 - admin_audit: 1.4.0 - admin_notifications: 1.0.2 - announcementcenter: 3.3.0 - bookmarks: 0.13.0 - bookmarks_fulltextsearch: 0.99.2 - bruteforcesettings: 1.1.0 - calendar: 1.6.2 - camerarawpreviews: 0.5.6 - caniupdate: 0.2.0 - cloud_federation_api: 0.0.1 - cms_pico: 0.9.7 - comments: 1.4.0 - contacts: 2.1.6 - dav: 1.6.0 - deck: 0.4.1 - defaultlinkopen: 1.1.0 - dicomviewer: 1.0.2 - end_to_end_encryption: 1.0.5 - external: 3.1.0 - federatedfilesharing: 1.4.0 - federation: 1.4.0 - files: 1.9.0 - files_accesscontrol: 1.4.0 - files_automatedtagging: 1.4.0 - files_downloadactivity: 1.3.0 - files_external: 1.5.0 - files_fulltextsearch: 0.99.4 - files_fulltextsearch_tesseract: 0.99.1 - files_markdown: 2.0.4 - files_pdfviewer: 1.3.2 - files_retention: 1.3.0 - files_rightclick: 0.8.4 - files_sharing: 1.6.2 - files_texteditor: 2.6.0 - files_trashbin: 1.4.1 - files_versions: 1.7.1 - files_videoplayer: 1.3.0 - firstrunwizard: 2.3.0 - fulltextsearch: 0.99.3 - fulltextsearch_elasticsearch: 0.99.2 - gallery: 18.1.0 - gpxpod: 2.3.1 - groupfolders: 1.3.3 - impersonate: 1.1.0 - issuetemplate: 0.4.0 - logreader: 2.0.0 - lookup_server_connector: 1.2.0 - mail: 0.10.0 - metadata: 0.7.0 - mindmaps: 0.1.0 - news: 13.0.0 - nextcloud_announcements: 1.3.0 - notes: 2.4.2 - notifications: 2.2.1 - oauth2: 1.2.1 - ocsms: 1.13.1 - onlyoffice: 2.0.4 - password_policy: 1.4.0 - phonetrack: 0.3.1 - piwik: 0.4.1 - polls: 0.8.3 - previewgenerator: 1.1.0 - provisioning_api: 1.4.0 - qownnotesapi: 18.8.0 - quota_warning: 1.3.0 - radio: 0.6.3 - ransomware_detection: 0.4.0 - ransomware_protection: 1.2.0 - recommendation_assistant: 0.2.2 - sensorlogger: 0.0.7 - serverinfo: 1.4.0 - sharebymail: 1.4.0 - socialsharing_diaspora: 1.0.3 - socialsharing_email: 1.0.4 - socialsharing_facebook: 1.0.3 - socialsharing_googleplus: 1.0.3 - socialsharing_twitter: 1.0.3 - spreed: 4.0.0 - support: 1.0.0 - survey_client: 1.2.0 - systemtags: 1.4.0 - tasks: 0.9.7 - telephoneprovider: 1.0.1 - twofactor_backupcodes: 1.3.1 - twofactor_gateway: 0.9.0 - twofactor_totp: 1.5.0 - twofactor_u2f: 1.6.1 - twofactor_yubikey: 0.4.0 - updatenotification: 1.4.1 - user_usage_report: 1.1.0 - workflowengine: 1.4.0 Disabled: - checksum - circles - drawio - drop_account - encryption - files_clipboard - files_frommail - files_opds - files_reader - gpxedit - gpxmotion - keeweb - mood - ocdownloader - ojsxc - passman - theming - user_external - user_ldap - weather ```
Configuration (config/config.php) ``` { "instanceid": "***REMOVED SENSITIVE VALUE***", "passwordsalt": "***REMOVED SENSITIVE VALUE***", "secret": "***REMOVED SENSITIVE VALUE***", "trusted_domains": [ "***REMOVED SENSITIVE VALUE***", "***REMOVED SENSITIVE VALUE***", "***REMOVED SENSITIVE VALUE***" ], "datadirectory": "***REMOVED SENSITIVE VALUE***", "tempdirectory": "\/tmp\/", "filelocking.enabled": "true", "memcache.local": "\\OC\\Memcache\\Redis", "memcache.locking": "\\OC\\Memcache\\Redis", "redis": { "host": "***REMOVED SENSITIVE VALUE***", "port": 0, "timeout": 0 }, "dbtype": "mysql", "version": "14.0.0.19", "dbname": "***REMOVED SENSITIVE VALUE***", "dbhost": "***REMOVED SENSITIVE VALUE***", "dbtableprefix": "oc_", "dbuser": "***REMOVED SENSITIVE VALUE***", "dbpassword": "***REMOVED SENSITIVE VALUE***", "installed": true, "forcessl": true, "forceSSLforSubdomains": true, "mail_from_address": "***REMOVED SENSITIVE VALUE***", "mail_smtpmode": "smtp", "mail_domain": "***REMOVED SENSITIVE VALUE***", "mail_smtphost": "***REMOVED SENSITIVE VALUE***", "mail_smtpauth": 1, "mail_smtpport": "587", "mail_smtpname": "***REMOVED SENSITIVE VALUE***", "mail_smtppassword": "***REMOVED SENSITIVE VALUE***", "mail_smtpsecure": "tls", "mail_smtpauthtype": "PLAIN", "loglevel": 2, "maintenance": false, "enable_previews": true, "enabledPreviewProviders": [ "OC\\Preview\\PNG", "OC\\Preview\\JPEG", "OC\\Preview\\GIF", "OC\\Preview\\BMP", "OC\\Preview\\XBitmap", "OC\\Preview\\MP3", "OC\\Preview\\TXT", "OC\\Preview\\MarkDown", "OC\\Preview\\TIFF", "OC\\Preview\\SVG", "OC\\Preview\\RAW", "OC\\Preview\\Epub", "OC\\Preview\\FB2", "OC\\Preview\\PDF", "OC\\Preview\\OpenDocument", "OC\\Preview\\StarOffice", "OC\\Preview\\MSOfficeDoc", "OC\\Preview\\MSOffice2003", "OC\\Preview\\MSOffice2007" ], "preview_max_x": 3840, "preview_max_y": 2160, "preview_max_scale_factor": 1, "appstore.experimental.enabled": true, "remember_login_cookie_lifetime": 1296000, "session_lifetime": 86400, "session_keepalive": true, "trashbin_retention_obligation": "auto", "knowledgebaseenabled": true, "logtimezone": "Europe\/Berlin", "cron_log": true, "log_rotate_size": 10857600, "theme": "", "updatechecker": false, "htaccess.RewriteBase": "\/", "check_for_working_wellknown_setup": true, "check_for_working_htaccess": true, "appstoreenabled": true, "updater.release.channel": "beta", "singleuser": false, "overwrite.cli.url": "***REMOVED SENSITIVE VALUE***" } ```

Are you using external storage, if yes which one: local/smb/sftp/...

Are you using encryption:

Are you using an external user-backend, if yes which one: LDAP/ActiveDirectory/Webdav/...

Client configuration

Browser: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0

Operating system:

Logs

Browser log ``` ```
Nextcloud log ``` ```
Browser log
budachst commented 5 years ago

+1 here. Although the file, which causes this issue is "only" approx 100MB in size, it's contents seem to give tesseract issues. In my case its a - admittedly - huge texture file of a wooden surface. I tried to run this file through tesseract on the terminal and it took tesseract a whopping 19 mins. to produce a 3k file of… well probably blanks.

This initially caused the DB connection to get closed and bubbled up a DBA exception, but when I adjusted the max_timeout for mysql from 600 to 1800, indexing stopped by "force quit".

Also, I have limited the file size for indexing to 64mb, so this file shouldn't have been considered for indexing anyway, but that's only a side-note.

My NC is at 13.0.6 and FTS is current.

I restarted the index and I had set the max filesize to 20MB, but now FTS has tesseract indexing the same file again:

[root@nextcloud Tex]# ps -ef | grep tesseract nginx 23733 21692 0 22:00 pts/2 00:00:00 sh -c "tesseract" "/mnt/nextcloud/data/valentina.kohl/files/BRUNNER_NEUHEITEN/vonCRAFT/VALET/Tex/wood-075_teak-2_d_eiche-hell_XL.png" stdout -psm 3 -l deu+eng 2> /dev/null nginx 23734 23733 99 22:00 pts/2 00:11:39 tesseract /mnt/nextcloud/data/valentina.kohl/files/BRUNNER_NEUHEITEN/vonCRAFT/VALET/Tex/wood-075_teak-2_d_eiche-hell_XL.png stdout -psm 3 -l deu+eng

ArtificialOwl commented 5 years ago

@budachst thanks for opening a new issue, however, I cannot reproduce your issue

@tacruc I would say that ORCing a 600Mb file is not a good idea to start with, I think even a huge (100+ MB) pdf file would result with Tika crashing. This is the main reason I implemented a limit on the file size.

I am trying right now to work with your 600Mb file but more to see if I can ignore the issue and go on with the index

tacruc commented 5 years ago

@daita I agree that ORCing a 600Mb file is not reasonable. So one last stupid question, where can I configure the limit on the file size? I coulnd't find the option.

Sorry never mind just found it, but it is set to 50mb. So why does the file even get indexed? Cloud it be related to #387 ?

budachst commented 5 years ago

I think you'd set that in the preferences -> fulltextsearch -> maximum file size, which is set to 20 in my case.

tacruc commented 5 years ago

the Update to version 1.0.0 fixed the problem.