Closed andrewborell closed 3 years ago
PDF scanned text does not create ocr and index either.
Had a moment to come back to this issue. Attempted to execute tesseract from the cli as www-data user on a file directly and it didnt look good. Perhaps this would be a helpful command for others to know in debugging tesseract issues.
sudo -u www-data tesseract hyde.jpg output --oem 1 -l eng
dbo@hera:/var/www/html/nextcloud/data/dbo/files# sudo -u www-data tesseract hyde.jpg output --oem 1 -l eng Error opening data file /usr/local/share/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.
Afraid im gonna have to throw in the towel for this run at it tonight. From my previous message, Tesseract and Leptonica were compiled from scratch, which led to a few issues. I removed them entirely in reverse order, rebooted, checked that no remnants were left over after login, then installed with the package manager. The Tesseract app works perfectly, even when I use the www-data user. When I test with nextcloud I still get the exception error for image files or pdf files containing scanned text as an image. Other types work fine when uploaded.
dbo@hera:/var/www/html/nextcloud/data/dbo/files# sudo -u www-data tesseract /var/www/html/nextcloud/data/dbo/files/hyde.jpg output --oem 1 -l eng
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
The command above creates the output.txt with a perfect english translation of a passage from Dr Jekyll and Mr Hyde.
dbo@hera:/var/www/html/nextcloud/data/dbo/files# sudo -u www-data php /var/www/html/nextcloud/occ fulltextsearch:document:provider dbo files 5043 --content
Document:
{
"id": "5043",
"providerId": "files",
"access": {
"ownerId": "dbo",
"viewerId": "",
"users": [],
"groups": [],
"circles": [],
"links": []
},
"modifiedTime": 1557289243,
"title": "hyde.jpg",
"link": "",
"index": {
"ownerId": "dbo",
"providerId": "files",
"source": "files_local",
"documentId": "5043",
"lastIndex": 0,
"errors": [
{
"message": "Error while getting file content",
"exception": "Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found",
"severity": 3
}
],
"errorCount": 1,
"status": 28,
"options": []
},
"source": "files_local",
"info": {
"share_names": {
"dbo": "hyde.jpg"
}
},
"hash": "",
"contentSize": 0,
"tags": [],
"metatags": [
"files_local"
],
"subtags": [],
"more": [],
"excerpts": [],
"score": ""
}
Content:
1 Part(s)
'comments' (size: 0)
I saw an update for tesseract a few days ago and applied it. Not sure if this was in fact a new update for the app or something in my nextcloud was not displaying it. When I applied the update everything seemed to start working.
Steps to reproduce
upload an image file ( png, jpg, tiff, etc )
Expected behaviour
should create ocr
Actual behaviour
throws exception Class 'OCP\FullTextSearch\Model\IndexDocument' not found
Server configuration detail
Operating system: Linux 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 x86_64
Webserver: Apache (apache2handler) 2.4
Database: mysql 10.2.23
PHP version:
7.2.17-0ubuntu0.18.04.1 Modules loaded: Core, date, libxml, openssl, pcre, zlib, filter, hash, Reflection, SPL, sodium, session, standard, apache2handler, mysqlnd, PDO, xml, apcu, apc, calendar, ctype, curl, dom, mbstring, fileinfo, ftp, gd, gettext, iconv, igbinary, imagick, intl, json, ldap, exif, mysqli, pdo_mysql, Phar, posix, readline, redis, shmop, SimpleXML, smbclient, sockets, sysvmsg, sysvsem, sysvshm, tokenizer, wddx, xmlreader, xmlwriter, xsl, zip, libsmbclient, Zend OPcache
Nextcloud version: 16.0.0 - 16.0.0.9
Updated from an older Nextcloud/ownCloud or fresh install:
Where did you install Nextcloud from: unknown
Signing status
Array ( )List of activated apps
``` Enabled: - accessibility: 1.2.0 - activity: 2.9.1 - apporder: 0.7.1 - bruteforcesettings: 1.3.0 - calendar: 1.7.0 - cloud_federation_api: 0.2.0 - comments: 1.6.0 - dav: 1.9.2 - drawio: 0.9.3 - event_update_notification: 0.3.4 - external: 3.3.0 - federatedfilesharing: 1.6.0 - files: 1.11.0 - files_external: 1.7.0 - files_fulltextsearch: 1.3.0 - files_fulltextsearch_tesseract: 1.2.2 - files_pdfviewer: 1.5.0 - files_rightclick: 0.13.0 - files_sharing: 1.8.0 - files_texteditor: 2.8.0 - files_trashbin: 1.6.0 - files_versions: 1.9.0 - files_videoplayer: 1.5.0 - flowupload: 0.1.0 - fulltextsearch: 1.3.1 - fulltextsearch_elasticsearch: 1.3.0 - groupfolders: 3.0.0 - guests: 1.0.0 - issuetemplate: 0.5.0 - logreader: 2.1.0 - lookup_server_connector: 1.4.0 - metadata: 0.9.0 - oauth2: 1.4.2 - ojsxc: 3.4.3 - onlyoffice: 2.1.10 - password_policy: 1.6.0 - passwords: 2019.4.2 - provisioning_api: 1.6.0 - rainloop: 6.0.2 - sharebymail: 1.6.0 - systemtags: 1.6.0 - theming: 1.7.0 - twofactor_backupcodes: 1.5.0 - user_ldap: 1.6.0 - viewer: 1.0.0 - workflowengine: 1.6.0 Disabled: - admin_audit - audioplayer - contacts - dicomviewer - encryption - federation - firstrunwizard - gallery - mail - nextcloud_announcements - notifications - privacy - recommendations - serverinfo - socialsharing_email - support - survey_client - updatenotification ```Configuration (config/config.php)
``` { "instanceid": "***REMOVED SENSITIVE VALUE***", "passwordsalt": "***REMOVED SENSITIVE VALUE***", "secret": "***REMOVED SENSITIVE VALUE***", "trusted_domains": [ "DOMAIN.TLD", ], "datadirectory": "***REMOVED SENSITIVE VALUE***", "dbtype": "mysql", "version": "16.0.0.9", "overwrite.cli.url": "https:\/\/DOMAIN.TLD\/", "dbname": "***REMOVED SENSITIVE VALUE***", "dbhost": "***REMOVED SENSITIVE VALUE***", "dbport": "", "dbtableprefix": "oc_", "dbuser": "***REMOVED SENSITIVE VALUE***", "dbpassword": "***REMOVED SENSITIVE VALUE***", "memcache.local": "\\OC\\Memcache\\APCu", "installed": true, "ldapIgnoreNamingRules": false, "ldapProviderFactory": "OCA\\User_LDAP\\LDAPProviderFactory", "maintenance": false, "mail_smtpmode": "smtp", "mail_sendmailmode": "smtp", "mail_from_address": "***REMOVED SENSITIVE VALUE***", "mail_domain": "***REMOVED SENSITIVE VALUE***", "mail_smtpauthtype": "LOGIN", "mail_smtpauth": 1, "mail_smtphost": "***REMOVED SENSITIVE VALUE***", "mail_smtpport": "25", "mail_smtpname": "***REMOVED SENSITIVE VALUE***", "mail_smtppassword": "***REMOVED SENSITIVE VALUE***", "updater.release.channel": "stable", "theme": "", "loglevel": 0, "csrf.disabled": true, "debug": true, "mysql.utf8mb4": true, "app_install_overwrite": [ "mail", "calendar", "timetracker", "twainwebscan" ], "activity_expire_days": 14, "auth.bruteforce.protection.enabled": "false", "blacklisted_files": [ ".htaccess", "Thumbs.db", "thumbs.db" ], "cron_log": true, "enable_previews": true, "enabledPreviewProviders": [ "OC\\Preview\\PNG", "OC\\Preview\\JPEG", "OC\\Preview\\GIF", "OC\\Preview\\BMP", "OC\\Preview\\XBitmap", "OC\\Preview\\Movie", "OC\\Preview\\PDF", "OC\\Preview\\MP3", "OC\\Preview\\TXT", "OC\\Preview\\MarkDown" ], "filesystem_check_changes": 0, "filelocking.enabled": "true", "htaccess.RewriteBase": "\/", "integrity.check.disabled": false, "knowledgebaseenabled": false, "logfile": "\/var\/nc_data\/nextcloud.log", "logtimezone": "America\/Chicago", "log_rotate_size": 104857600, "memcache.locking": "\\OC\\Memcache\\Redis", "overwriteprotocol": "https", "preview_max_x": 1024, "preview_max_y": 768, "preview_max_scale_factor": 1, "redis": { "host": "***REMOVED SENSITIVE VALUE***", "port": 0, "timeout": 0 }, "quota_include_external_storage": false, "share_folder": "\/Shares", "skeletondirectory": "", "trashbin_retention_obligation": "auto, 7" } ```Are you using external storage, if yes which one: SMB
Are you using encryption: false
Are you using an external user-backend, if yes which one: LDAP
LDAP configuration (delete this par if not used)
``` background_sync_interval: 43200background_sync_offset: 0background_sync_prefix: s01cleanUpJobOffset: 0enabled: yesinstalled_version: 1.6.0s01_lastChange: 1556919017s01has_memberof_filter_support: 1s01home_folder_naming_rule: s01last_jpegPhoto_lookup: 0s01ldap_agent_password: ArEyoUkIdd1ngM3?==s01ldap_attributes_for_group_search: s01ldap_attributes_for_user_search: s01ldap_backup_host: s01ldap_backup_port: s01ldap_base: CN=Users,DC=DOMAIN,DC=TLDs01ldap_base_groups: CN=Users,DC=DOMAIN,DC=TLDs01ldap_base_users: CN=Users,DC=DOMAIN,DC=TLDs01ldap_cache_ttl: 600s01ldap_configuration_active: 1s01ldap_default_ppolicy_dn: s01ldap_display_name: displaynames01ldap_dn: cn=nextcloud,CN=Users,DC=DOMAIN,DC=TLDs01ldap_dynamic_group_member_url: s01ldap_email_attr: mails01ldap_experienced_admin: 0s01ldap_expert_username_attr: sAMAccountNames01ldap_expert_uuid_group_attr: s01ldap_expert_uuid_user_attr: s01ldap_gid_number: gidNumbers01ldap_group_display_name: cns01ldap_group_filter: s01ldap_group_filter_mode: 0s01ldap_group_member_assoc_attribute: uniqueMembers01ldap_groupfilter_groups: s01ldap_groupfilter_objectclass: s01ldap_host: ldap://HERA.DOMAIN.TLDs01ldap_login_filter: (&(&(|(objectclass=person))(|(|(memberof=CN=ncusers,CN=Users,DC=DOMAIN,DC=TLD)(primaryGroupID=1377))))(|(samaccountname=%uid)(entryUUID=%uid)))s01ldap_login_filter_mode: 1s01ldap_loginfilter_attributes: s01ldap_loginfilter_email: 0s01ldap_loginfilter_username: 1s01ldap_nested_groups: 0s01ldap_override_main_server: s01ldap_paging_size: 500s01ldap_port: 389s01ldap_quota_attr: s01ldap_quota_def: s01ldap_tls: 0s01ldap_turn_off_cert_check: 1s01ldap_turn_on_pwd_change: 1s01ldap_user_avatar_rule: defaults01ldap_user_display_name_2: s01ldap_user_filter_mode: 1s01ldap_userfilter_groups: Domain Userss01ldap_userfilter_objectclass: persons01ldap_userlist_filter: (&(|(objectclass=person))(|(|(memberof=CN=ncusers,CN=Users,DC=DOMAIN,DC=TLD)(primaryGroupID=1377))))s01use_memberof_to_detect_membership: 1types: authentication ```Client configuration
Browser: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36
Operating system: Windows 10 ( Windows Any )
Logs
Web server error log
``` No apache errors. ```Nextcloud log
``` [ { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch_tesseract\/lib\/Service\/TesseractService.php", "line":118, "function":"extractContentUsingTesseractOCR", "class":"OCA\\Files_FullTextSearch_Tesseract\\Service\\TesseractService", "type":"->", "args":[ { "id":"4540", "providerId":"files", "access":{ "ownerId":"dbo", "viewerId":"", "users":[ ], "groups":[ ], "circles":[ ], "links":[ ] }, "modifiedTime":1557022926, "title":"JH.jpg", "link":"", "index":{ "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] }, "source":"files_local", "info":{ "share_names":{ "dbo":"JH.jpg" } }, "hash":"", "contentSize":0, "tags":[ ], "metatags":[ ], "subtags":[ ], "more":[ ], "excerpts":[ ], "score":"" }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch_tesseract\/lib\/AppInfo\/Application.php", "line":75, "function":"onFileIndexing", "class":"OCA\\Files_FullTextSearch_Tesseract\\Service\\TesseractService", "type":"->", "args":[ { } ] }, { "file":"\/var\/www\/html\/nextcloud\/3rdparty\/symfony\/event-dispatcher\/EventDispatcher.php", "line":212, "function":"OCA\\Files_FullTextSearch_Tesseract\\AppInfo\\{closure}", "class":"OCA\\Files_FullTextSearch_Tesseract\\AppInfo\\Application", "type":"->", "args":[ { }, "\\OCA\\Files_FullTextSearch::onFileIndexing", { } ] }, { "file":"\/var\/www\/html\/nextcloud\/3rdparty\/symfony\/event-dispatcher\/EventDispatcher.php", "line":44, "function":"doDispatch", "class":"Symfony\\Component\\EventDispatcher\\EventDispatcher", "type":"->", "args":[ [ { } ], "\\OCA\\Files_FullTextSearch::onFileIndexing", { } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch\/lib\/Service\/ExtensionService.php", "line":115, "function":"dispatch", "class":"Symfony\\Component\\EventDispatcher\\EventDispatcher", "type":"->", "args":[ "\\OCA\\Files_FullTextSearch::onFileIndexing", { } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch\/lib\/Service\/ExtensionService.php", "line":83, "function":"dispatch", "class":"OCA\\Files_FullTextSearch\\Service\\ExtensionService", "type":"->", "args":[ "\\OCA\\Files_FullTextSearch::onFileIndexing", { "file":{ }, "document":{ "id":"4540", "providerId":"files", "access":{ "ownerId":"dbo", "viewerId":"", "users":[ ], "groups":[ ], "circles":[ ], "links":[ ] }, "modifiedTime":1557022926, "title":"JH.jpg", "link":"", "index":{ "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] }, "source":"files_local", "info":{ "share_names":{ "dbo":"JH.jpg" } }, "hash":"", "contentSize":0, "tags":[ ], "metatags":[ ], "subtags":[ ], "more":[ ], "excerpts":[ ], "score":"" } } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch\/lib\/Service\/FilesService.php", "line":683, "function":"fileIndexing", "class":"OCA\\Files_FullTextSearch\\Service\\ExtensionService", "type":"->", "args":[ { "id":"4540", "providerId":"files", "access":{ "ownerId":"dbo", "viewerId":"", "users":[ ], "groups":[ ], "circles":[ ], "links":[ ] }, "modifiedTime":1557022926, "title":"JH.jpg", "link":"", "index":{ "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] }, "source":"files_local", "info":{ "share_names":{ "dbo":"JH.jpg" } }, "hash":"", "contentSize":0, "tags":[ ], "metatags":[ ], "subtags":[ ], "more":[ ], "excerpts":[ ], "score":"" }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch\/lib\/Service\/FilesService.php", "line":627, "function":"updateContentFromFile", "class":"OCA\\Files_FullTextSearch\\Service\\FilesService", "type":"->", "args":[ { "id":"4540", "providerId":"files", "access":{ "ownerId":"dbo", "viewerId":"", "users":[ ], "groups":[ ], "circles":[ ], "links":[ ] }, "modifiedTime":1557022926, "title":"JH.jpg", "link":"", "index":{ "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] }, "source":"files_local", "info":{ "share_names":{ "dbo":"JH.jpg" } }, "hash":"", "contentSize":0, "tags":[ ], "metatags":[ ], "subtags":[ ], "more":[ ], "excerpts":[ ], "score":"" }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch\/lib\/Service\/FilesService.php", "line":545, "function":"updateFilesDocumentFromFile", "class":"OCA\\Files_FullTextSearch\\Service\\FilesService", "type":"->", "args":[ { "id":"4540", "providerId":"files", "access":{ "ownerId":"dbo", "viewerId":"", "users":[ ], "groups":[ ], "circles":[ ], "links":[ ] }, "modifiedTime":1557022926, "title":"JH.jpg", "link":"", "index":{ "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] }, "source":"files_local", "info":{ "share_names":{ "dbo":"JH.jpg" } }, "hash":"", "contentSize":0, "tags":[ ], "metatags":[ ], "subtags":[ ], "more":[ ], "excerpts":[ ], "score":"" }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch\/lib\/Service\/FilesService.php", "line":590, "function":"generateDocumentFromIndex", "class":"OCA\\Files_FullTextSearch\\Service\\FilesService", "type":"->", "args":[ { "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/files_fulltextsearch\/lib\/Provider\/FilesProvider.php", "line":286, "function":"updateDocument", "class":"OCA\\Files_FullTextSearch\\Service\\FilesService", "type":"->", "args":[ { "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/fulltextsearch\/lib\/Service\/IndexService.php", "line":414, "function":"updateDocument", "class":"OCA\\Files_FullTextSearch\\Provider\\FilesProvider", "type":"->", "args":[ { "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/fulltextsearch\/lib\/Command\/Live.php", "line":291, "function":"updateDocument", "class":"OCA\\FullTextSearch\\Service\\IndexService", "type":"->", "args":[ { }, { }, { "ownerId":"dbo", "providerId":"files", "source":"files_local", "documentId":"4540", "lastIndex":0, "errors":[ { "message":"Error while getting file content", "exception":"Class 'OCP\\FullTextSearch\\Model\\IndexDocument' not found", "severity":3 } ], "errorCount":1, "status":28, "options":[ ] } ] }, { "file":"\/var\/www\/html\/nextcloud\/apps\/fulltextsearch\/lib\/Command\/Live.php", "line":258, "function":"liveCycle", "class":"OCA\\FullTextSearch\\Command\\Live", "type":"->", "args":[ ] }, { "file":"\/var\/www\/html\/nextcloud\/3rdparty\/symfony\/console\/Command\/Command.php", "line":255, "function":"execute", "class":"OCA\\FullTextSearch\\Command\\Live", "type":"->", "args":[ { }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/core\/Command\/Base.php", "line":166, "function":"run", "class":"Symfony\\Component\\Console\\Command\\Command", "type":"->", "args":[ { }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/3rdparty\/symfony\/console\/Application.php", "line":901, "function":"run", "class":"OC\\Core\\Command\\Base", "type":"->", "args":[ { }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/3rdparty\/symfony\/console\/Application.php", "line":262, "function":"doRunCommand", "class":"Symfony\\Component\\Console\\Application", "type":"->", "args":[ { }, { }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/3rdparty\/symfony\/console\/Application.php", "line":145, "function":"doRun", "class":"Symfony\\Component\\Console\\Application", "type":"->", "args":[ { }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/lib\/private\/Console\/Application.php", "line":213, "function":"run", "class":"Symfony\\Component\\Console\\Application", "type":"->", "args":[ { }, { } ] }, { "file":"\/var\/www\/html\/nextcloud\/console.php", "line":97, "function":"run", "class":"OC\\Console\\Application", "type":"->", "args":[ ] }, { "file":"\/var\/www\/html\/nextcloud\/occ", "line":11, "args":[ "\/var\/www\/html\/nextcloud\/console.php" ], "function":"require_once" } ] ```Browser log
No console errors.dbo@hera:/var/www/html/nextcloud# ghostscript -v
dbo@hera:/var/www/html/nextcloud# convert -version
dbo@hera:/var/www/html/nextcloud# tesseract --version
dbo@hera:/var/www/html/nextcloud# curl -XGET 'localhost:9200'
dbo@hera:/var/www/html/nextcloud# sudo -u www-data php /var/www/html/nextcloud/occ
dbo@hera:/var/www/html/nextcloud# sudo -u www-data php /var/www/html/nextcloud/occ
dbo@hera:/var/www/html/nextcloud# sudo -u www-data php ./occ