nextcloud / files_fulltextsearch

🔍 Index the content of your files
GNU Affero General Public License v3.0
59 stars 30 forks source link

Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found #66

Closed andrewborell closed 3 years ago

andrewborell commented 5 years ago

Steps to reproduce

upload an image file ( png, jpg, tiff, etc )

Expected behaviour

should create ocr

Actual behaviour

throws exception Class 'OCP\FullTextSearch\Model\IndexDocument' not found

Server configuration detail

Operating system: Linux 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 x86_64

Webserver: Apache (apache2handler) 2.4

Database: mysql 10.2.23

PHP version:

7.2.17-0ubuntu0.18.04.1 Modules loaded: Core, date, libxml, openssl, pcre, zlib, filter, hash, Reflection, SPL, sodium, session, standard, apache2handler, mysqlnd, PDO, xml, apcu, apc, calendar, ctype, curl, dom, mbstring, fileinfo, ftp, gd, gettext, iconv, igbinary, imagick, intl, json, ldap, exif, mysqli, pdo_mysql, Phar, posix, readline, redis, shmop, SimpleXML, smbclient, sockets, sysvmsg, sysvsem, sysvshm, tokenizer, wddx, xmlreader, xmlwriter, xsl, zip, libsmbclient, Zend OPcache

Nextcloud version: 16.0.0 - 16.0.0.9

Updated from an older Nextcloud/ownCloud or fresh install:

Where did you install Nextcloud from: unknown

Signing status Array ( )
List of activated apps ``` Enabled: - accessibility: 1.2.0 - activity: 2.9.1 - apporder: 0.7.1 - bruteforcesettings: 1.3.0 - calendar: 1.7.0 - cloud_federation_api: 0.2.0 - comments: 1.6.0 - dav: 1.9.2 - drawio: 0.9.3 - event_update_notification: 0.3.4 - external: 3.3.0 - federatedfilesharing: 1.6.0 - files: 1.11.0 - files_external: 1.7.0 - files_fulltextsearch: 1.3.0 - files_fulltextsearch_tesseract: 1.2.2 - files_pdfviewer: 1.5.0 - files_rightclick: 0.13.0 - files_sharing: 1.8.0 - files_texteditor: 2.8.0 - files_trashbin: 1.6.0 - files_versions: 1.9.0 - files_videoplayer: 1.5.0 - flowupload: 0.1.0 - fulltextsearch: 1.3.1 - fulltextsearch_elasticsearch: 1.3.0 - groupfolders: 3.0.0 - guests: 1.0.0 - issuetemplate: 0.5.0 - logreader: 2.1.0 - lookup_server_connector: 1.4.0 - metadata: 0.9.0 - oauth2: 1.4.2 - ojsxc: 3.4.3 - onlyoffice: 2.1.10 - password_policy: 1.6.0 - passwords: 2019.4.2 - provisioning_api: 1.6.0 - rainloop: 6.0.2 - sharebymail: 1.6.0 - systemtags: 1.6.0 - theming: 1.7.0 - twofactor_backupcodes: 1.5.0 - user_ldap: 1.6.0 - viewer: 1.0.0 - workflowengine: 1.6.0 Disabled: - admin_audit - audioplayer - contacts - dicomviewer - encryption - federation - firstrunwizard - gallery - mail - nextcloud_announcements - notifications - privacy - recommendations - serverinfo - socialsharing_email - support - survey_client - updatenotification ```
Configuration (config/config.php) ``` { "instanceid": "***REMOVED SENSITIVE VALUE***", "passwordsalt": "***REMOVED SENSITIVE VALUE***", "secret": "***REMOVED SENSITIVE VALUE***", "trusted_domains": [ "DOMAIN.TLD", ], "datadirectory": "***REMOVED SENSITIVE VALUE***", "dbtype": "mysql", "version": "16.0.0.9", "overwrite.cli.url": "https:\/\/DOMAIN.TLD\/", "dbname": "***REMOVED SENSITIVE VALUE***", "dbhost": "***REMOVED SENSITIVE VALUE***", "dbport": "", "dbtableprefix": "oc_", "dbuser": "***REMOVED SENSITIVE VALUE***", "dbpassword": "***REMOVED SENSITIVE VALUE***", "memcache.local": "\\OC\\Memcache\\APCu", "installed": true, "ldapIgnoreNamingRules": false, "ldapProviderFactory": "OCA\\User_LDAP\\LDAPProviderFactory", "maintenance": false, "mail_smtpmode": "smtp", "mail_sendmailmode": "smtp", "mail_from_address": "***REMOVED SENSITIVE VALUE***", "mail_domain": "***REMOVED SENSITIVE VALUE***", "mail_smtpauthtype": "LOGIN", "mail_smtpauth": 1, "mail_smtphost": "***REMOVED SENSITIVE VALUE***", "mail_smtpport": "25", "mail_smtpname": "***REMOVED SENSITIVE VALUE***", "mail_smtppassword": "***REMOVED SENSITIVE VALUE***", "updater.release.channel": "stable", "theme": "", "loglevel": 0, "csrf.disabled": true, "debug": true, "mysql.utf8mb4": true, "app_install_overwrite": [ "mail", "calendar", "timetracker", "twainwebscan" ], "activity_expire_days": 14, "auth.bruteforce.protection.enabled": "false", "blacklisted_files": [ ".htaccess", "Thumbs.db", "thumbs.db" ], "cron_log": true, "enable_previews": true, "enabledPreviewProviders": [ "OC\\Preview\\PNG", "OC\\Preview\\JPEG", "OC\\Preview\\GIF", "OC\\Preview\\BMP", "OC\\Preview\\XBitmap", "OC\\Preview\\Movie", "OC\\Preview\\PDF", "OC\\Preview\\MP3", "OC\\Preview\\TXT", "OC\\Preview\\MarkDown" ], "filesystem_check_changes": 0, "filelocking.enabled": "true", "htaccess.RewriteBase": "\/", "integrity.check.disabled": false, "knowledgebaseenabled": false, "logfile": "\/var\/nc_data\/nextcloud.log", "logtimezone": "America\/Chicago", "log_rotate_size": 104857600, "memcache.locking": "\\OC\\Memcache\\Redis", "overwriteprotocol": "https", "preview_max_x": 1024, "preview_max_y": 768, "preview_max_scale_factor": 1, "redis": { "host": "***REMOVED SENSITIVE VALUE***", "port": 0, "timeout": 0 }, "quota_include_external_storage": false, "share_folder": "\/Shares", "skeletondirectory": "", "trashbin_retention_obligation": "auto, 7" } ```

Are you using external storage, if yes which one: SMB

Are you using encryption: false

Are you using an external user-backend, if yes which one: LDAP

LDAP configuration (delete this par if not used) ``` background_sync_interval: 43200background_sync_offset: 0background_sync_prefix: s01cleanUpJobOffset: 0enabled: yesinstalled_version: 1.6.0s01_lastChange: 1556919017s01has_memberof_filter_support: 1s01home_folder_naming_rule: s01last_jpegPhoto_lookup: 0s01ldap_agent_password: ArEyoUkIdd1ngM3?==s01ldap_attributes_for_group_search: s01ldap_attributes_for_user_search: s01ldap_backup_host: s01ldap_backup_port: s01ldap_base: CN=Users,DC=DOMAIN,DC=TLDs01ldap_base_groups: CN=Users,DC=DOMAIN,DC=TLDs01ldap_base_users: CN=Users,DC=DOMAIN,DC=TLDs01ldap_cache_ttl: 600s01ldap_configuration_active: 1s01ldap_default_ppolicy_dn: s01ldap_display_name: displaynames01ldap_dn: cn=nextcloud,CN=Users,DC=DOMAIN,DC=TLDs01ldap_dynamic_group_member_url: s01ldap_email_attr: mails01ldap_experienced_admin: 0s01ldap_expert_username_attr: sAMAccountNames01ldap_expert_uuid_group_attr: s01ldap_expert_uuid_user_attr: s01ldap_gid_number: gidNumbers01ldap_group_display_name: cns01ldap_group_filter: s01ldap_group_filter_mode: 0s01ldap_group_member_assoc_attribute: uniqueMembers01ldap_groupfilter_groups: s01ldap_groupfilter_objectclass: s01ldap_host: ldap://HERA.DOMAIN.TLDs01ldap_login_filter: (&(&(|(objectclass=person))(|(|(memberof=CN=ncusers,CN=Users,DC=DOMAIN,DC=TLD)(primaryGroupID=1377))))(|(samaccountname=%uid)(entryUUID=%uid)))s01ldap_login_filter_mode: 1s01ldap_loginfilter_attributes: s01ldap_loginfilter_email: 0s01ldap_loginfilter_username: 1s01ldap_nested_groups: 0s01ldap_override_main_server: s01ldap_paging_size: 500s01ldap_port: 389s01ldap_quota_attr: s01ldap_quota_def: s01ldap_tls: 0s01ldap_turn_off_cert_check: 1s01ldap_turn_on_pwd_change: 1s01ldap_user_avatar_rule: defaults01ldap_user_display_name_2: s01ldap_user_filter_mode: 1s01ldap_userfilter_groups: Domain Userss01ldap_userfilter_objectclass: persons01ldap_userlist_filter: (&(|(objectclass=person))(|(|(memberof=CN=ncusers,CN=Users,DC=DOMAIN,DC=TLD)(primaryGroupID=1377))))s01use_memberof_to_detect_membership: 1types: authentication ```

Client configuration

Browser: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36

Operating system: Windows 10 ( Windows Any )

Logs

Web server error log ``` No apache errors. ```
Nextcloud log ``` [ { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch_tesseract\/lib\/Service\/TesseractService.php", "line":118, "function":"extractContentUsingTesseractOCR", "class":"OCA\\Files_FullTextSearch_Tesseract\\Service\\TesseractService", "type":"->", "args":[ { "id":"4540", "providerId":"files", "access":{ "ownerId":"dbo", "viewerId":"", "users":[ ], "groups":[ ], "circles":[ ], "links":[ ] }, "modifiedTime":1557022926, "title":"JH.jpg", "link":"", "index":{ "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] }, "source":"files_local", "info":{ "share_names":{ "dbo":"JH.jpg" } }, "hash":"", "contentSize":0, "tags":[ ], "metatags":[ ], "subtags":[ ], "more":[ ], "excerpts":[ ], "score":"" }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch_tesseract\/lib\/AppInfo\/Application.php", "line":75, "function":"onFileIndexing", "class":"OCA\\Files_FullTextSearch_Tesseract\\Service\\TesseractService", "type":"->", "args":[ { } ] }, { "file":"\/var\/www\/html\/nextcloud\/3rdparty\/symfony\/event-dispatcher\/EventDispatcher.php", "line":212, "function":"OCA\\Files_FullTextSearch_Tesseract\\AppInfo\\{closure}", "class":"OCA\\Files_FullTextSearch_Tesseract\\AppInfo\\Application", "type":"->", "args":[ { }, "\\OCA\\Files_FullTextSearch::onFileIndexing", { } ] }, { "file":"\/var\/www\/html\/nextcloud\/3rdparty\/symfony\/event-dispatcher\/EventDispatcher.php", "line":44, "function":"doDispatch", "class":"Symfony\\Component\\EventDispatcher\\EventDispatcher", "type":"->", "args":[ [ { } ], "\\OCA\\Files_FullTextSearch::onFileIndexing", { } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch\/lib\/Service\/ExtensionService.php", "line":115, "function":"dispatch", "class":"Symfony\\Component\\EventDispatcher\\EventDispatcher", "type":"->", "args":[ "\\OCA\\Files_FullTextSearch::onFileIndexing", { } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch\/lib\/Service\/ExtensionService.php", "line":83, "function":"dispatch", "class":"OCA\\Files_FullTextSearch\\Service\\ExtensionService", "type":"->", "args":[ "\\OCA\\Files_FullTextSearch::onFileIndexing", { "file":{ }, "document":{ "id":"4540", "providerId":"files", "access":{ "ownerId":"dbo", "viewerId":"", "users":[ ], "groups":[ ], "circles":[ ], "links":[ ] }, "modifiedTime":1557022926, "title":"JH.jpg", "link":"", "index":{ "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] }, "source":"files_local", "info":{ "share_names":{ "dbo":"JH.jpg" } }, "hash":"", "contentSize":0, "tags":[ ], "metatags":[ ], "subtags":[ ], "more":[ ], "excerpts":[ ], "score":"" } } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch\/lib\/Service\/FilesService.php", "line":683, "function":"fileIndexing", "class":"OCA\\Files_FullTextSearch\\Service\\ExtensionService", "type":"->", "args":[ { "id":"4540", "providerId":"files", "access":{ "ownerId":"dbo", "viewerId":"", "users":[ ], "groups":[ ], "circles":[ ], "links":[ ] }, "modifiedTime":1557022926, "title":"JH.jpg", "link":"", "index":{ "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] }, "source":"files_local", "info":{ "share_names":{ "dbo":"JH.jpg" } }, "hash":"", "contentSize":0, "tags":[ ], "metatags":[ ], "subtags":[ ], "more":[ ], "excerpts":[ ], "score":"" }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch\/lib\/Service\/FilesService.php", "line":627, "function":"updateContentFromFile", "class":"OCA\\Files_FullTextSearch\\Service\\FilesService", "type":"->", "args":[ { "id":"4540", "providerId":"files", "access":{ "ownerId":"dbo", "viewerId":"", "users":[ ], "groups":[ ], "circles":[ ], "links":[ ] }, "modifiedTime":1557022926, "title":"JH.jpg", "link":"", "index":{ "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] }, "source":"files_local", "info":{ "share_names":{ "dbo":"JH.jpg" } }, "hash":"", "contentSize":0, "tags":[ ], "metatags":[ ], "subtags":[ ], "more":[ ], "excerpts":[ ], "score":"" }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch\/lib\/Service\/FilesService.php", "line":545, "function":"updateFilesDocumentFromFile", "class":"OCA\\Files_FullTextSearch\\Service\\FilesService", "type":"->", "args":[ { "id":"4540", "providerId":"files", "access":{ "ownerId":"dbo", "viewerId":"", "users":[ ], "groups":[ ], "circles":[ ], "links":[ ] }, "modifiedTime":1557022926, "title":"JH.jpg", "link":"", "index":{ "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] }, "source":"files_local", "info":{ "share_names":{ "dbo":"JH.jpg" } }, "hash":"", "contentSize":0, "tags":[ ], "metatags":[ ], "subtags":[ ], "more":[ ], "excerpts":[ ], "score":"" }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch\/lib\/Service\/FilesService.php", "line":590, "function":"generateDocumentFromIndex", "class":"OCA\\Files_FullTextSearch\\Service\\FilesService", "type":"->", "args":[ { "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch\/lib\/Provider\/FilesProvider.php", "line":286, "function":"updateDocument", "class":"OCA\\Files_FullTextSearch\\Service\\FilesService", "type":"->", "args":[ { "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/fulltextsearch\/lib\/Service\/IndexService.php", "line":414, "function":"updateDocument", "class":"OCA\\Files_FullTextSearch\\Provider\\FilesProvider", "type":"->", "args":[ { "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/fulltextsearch\/lib\/Command\/Live.php", "line":291, "function":"updateDocument", "class":"OCA\\FullTextSearch\\Service\\IndexService", "type":"->", "args":[ { }, { }, { "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/fulltextsearch\/lib\/Command\/Live.php", "line":258, "function":"liveCycle", "class":"OCA\\FullTextSearch\\Command\\Live", "type":"->", "args":[ ] }, { "file":"\/var\/www\/html\/nextcloud\/3rdparty\/symfony\/console\/Command\/Command.php", "line":255, "function":"execute", "class":"OCA\\FullTextSearch\\Command\\Live", "type":"->", "args":[ { }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/core\/Command\/Base.php", "line":166, "function":"run", "class":"Symfony\\Component\\Console\\Command\\Command", "type":"->", "args":[ { }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/3rdparty\/symfony\/console\/Application.php", "line":901, "function":"run", "class":"OC\\Core\\Command\\Base", "type":"->", "args":[ { }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/3rdparty\/symfony\/console\/Application.php", "line":262, "function":"doRunCommand", "class":"Symfony\\Component\\Console\\Application", "type":"->", "args":[ { }, { }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/3rdparty\/symfony\/console\/Application.php", "line":145, "function":"doRun", "class":"Symfony\\Component\\Console\\Application", "type":"->", "args":[ { }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/lib\/private\/Console\/Application.php", "line":213, "function":"run", "class":"Symfony\\Component\\Console\\Application", "type":"->", "args":[ { }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/console.php", "line":97, "function":"run", "class":"OC\\Console\\Application", "type":"->", "args":[ ] }, { "file":"\/var\/www\/html\/nextcloud\/occ", "line":11, "args":[ "\/var\/www\/html\/nextcloud\/console.php" ], "function":"require_once" } ] ```
Browser log No console errors.

dbo@hera:/var/www/html/nextcloud# ghostscript -v

GPL Ghostscript 9.26 (2018-11-20)
Copyright (C) 2018 Artifex Software, Inc.  All rights reserved.

dbo@hera:/var/www/html/nextcloud# convert -version

Version: ImageMagick 6.9.7-4 Q16 x86_64 20170114 http://www.imagemagick.org
Copyright: © 1999-2017 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC Modules OpenMP
Delegates (built-in): bzlib djvu fftw fontconfig freetype jbig jng jpeg lcms lqr ltdl lzma openexr pangocairo png tiff wmf x xml zlib

dbo@hera:/var/www/html/nextcloud# tesseract --version

tesseract 4.0.0-beta.1
 leptonica-1.74.4
  libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11

dbo@hera:/var/www/html/nextcloud# curl -XGET 'localhost:9200'

{
  "name" : "hera-1",
  "cluster_name" : "nc-search",
  "cluster_uuid" : "f2Jll5r4QniNod7xa6aG4g",
  "version" : {
    "number" : "6.7.2",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "56c6e48",
    "build_date" : "2019-04-29T09:05:50.290371Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

dbo@hera:/var/www/html/nextcloud# sudo -u www-data php /var/www/html/nextcloud/occ

fulltextsearch:check
Full text search 1.3.1

- Search Platform:
Elasticsearch 1.3.0
{
    "elastic_host": [
        "http:\/\/localhost:9200"
    ],
    "elastic_index": "nextcloud"
}

- Content Providers:
Files 1.3.0
{
    "files_local": "1",
    "files_external": "1",
    "files_group_folders": "0",
    "files_encrypted": "0",
    "files_federated": "0",
    "files_size": "10",
    "files_pdf": "1",
    "files_office": "1",
    "files_image": "0",
    "files_audio": "0"
}

dbo@hera:/var/www/html/nextcloud# sudo -u www-data php /var/www/html/nextcloud/occ

fulltextsearch:test

.Testing your current setup:
Creating mocked content provider. ok
Testing mocked provider: get indexable documents. (2 items) ok
Loading search platform. (Elasticsearch) ok
Testing search platform. ok
Locking process ok
Removing test. ok
Pausing 3 seconds 1 2 3 ok
Initializing index mapping. ok
Indexing generated documents. ok
Pausing 3 seconds 1 2 3 ok
Retreiving content from a big index (license). (size: 32386) ok
Comparing document with source. ok
Searching basic keywords:
 - 'test' (result: 1, expected: ["simple"]) ok
 - 'document is a simple test' (result: 2, expected: ["simple","license"]) ok
 - '"document is a test"' (result: 0, expected: []) ok
 - '"document is a simple test"' (result: 1, expected: ["simple"]) ok
 - 'document is a simple -test' (result: 1, expected: ["license"]) ok
 - 'document is a simple +test' (result: 1, expected: ["simple"]) ok
 - '-document is a simple test' (result: 0, expected: []) ok
Updating documents access. Force Quit

dbo@hera:/var/www/html/nextcloud# sudo -u www-data php ./occ

fulltextsearch:document:provider dbo files 4540 --content
Document:
{
    "id": "4540",
    "providerId": "files",
    "access": {
        "ownerId": "dbo",
        "viewerId": "",
        "users": [],
        "groups": [],
        "circles": [],
        "links": []
    },
    "modifiedTime": 1557022926,
    "title": "JH.jpg",
    "link": "",
    "index": {
        "ownerId": "dbo",
        "providerId": "files",
        "source": "files_local",
        "documentId": "4540",
        "lastIndex": 0,
        "errors": [
            {
                "message": "Error while getting file content",
                "exception": "Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found",
                "severity": 3
            }
        ],
        "errorCount": 1,
        "status": 28,
        "options": []
    },
    "source": "files_local",
    "info": {
        "share_names": {
            "dbo": "JH.jpg"
        }
    },
    "hash": "",
    "contentSize": 0,
    "tags": [],
    "metatags": [
        "files_local"
    ],
    "subtags": [],
    "more": [],
    "excerpts": [],
    "score": ""
}
Content:

1 Part(s)
'comments'    (size: 0)
andrewborell commented 5 years ago

PDF scanned text does not create ocr and index either.

andrewborell commented 5 years ago

Had a moment to come back to this issue. Attempted to execute tesseract from the cli as www-data user on a file directly and it didnt look good. Perhaps this would be a helpful command for others to know in debugging tesseract issues.

sudo -u www-data tesseract hyde.jpg output --oem 1 -l eng

dbo@hera:/var/www/html/nextcloud/data/dbo/files# sudo -u www-data tesseract hyde.jpg output --oem 1 -l eng Error opening data file /usr/local/share/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.

andrewborell commented 5 years ago

Afraid im gonna have to throw in the towel for this run at it tonight. From my previous message, Tesseract and Leptonica were compiled from scratch, which led to a few issues. I removed them entirely in reverse order, rebooted, checked that no remnants were left over after login, then installed with the package manager. The Tesseract app works perfectly, even when I use the www-data user. When I test with nextcloud I still get the exception error for image files or pdf files containing scanned text as an image. Other types work fine when uploaded.

dbo@hera:/var/www/html/nextcloud/data/dbo/files# sudo -u www-data tesseract /var/www/html/nextcloud/data/dbo/files/hyde.jpg output --oem 1 -l eng

Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica The command above creates the output.txt with a perfect english translation of a passage from Dr Jekyll and Mr Hyde.

dbo@hera:/var/www/html/nextcloud/data/dbo/files# sudo -u www-data php /var/www/html/nextcloud/occ fulltextsearch:document:provider dbo files 5043 --content

Document:
{
    "id": "5043",
    "providerId": "files",
    "access": {
        "ownerId": "dbo",
        "viewerId": "",
        "users": [],
        "groups": [],
        "circles": [],
        "links": []
    },
    "modifiedTime": 1557289243,
    "title": "hyde.jpg",
    "link": "",
    "index": {
        "ownerId": "dbo",
        "providerId": "files",
        "source": "files_local",
        "documentId": "5043",
        "lastIndex": 0,
        "errors": [
            {
                "message": "Error while getting file content",
                "exception": "Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found",
                "severity": 3
            }
        ],
        "errorCount": 1,
        "status": 28,
        "options": []
    },
    "source": "files_local",
    "info": {
        "share_names": {
            "dbo": "hyde.jpg"
        }
    },
    "hash": "",
    "contentSize": 0,
    "tags": [],
    "metatags": [
        "files_local"
    ],
    "subtags": [],
    "more": [],
    "excerpts": [],
    "score": ""
}
Content:

1 Part(s)
'comments'    (size: 0)
andrewborell commented 5 years ago

I saw an update for tesseract a few days ago and applied it. Not sure if this was in fact a new update for the app or something in my nextcloud was not displaying it. When I applied the update everything seemed to start working.