nextcloud / fulltextsearch

🔍 Core of the full-text search framework for Nextcloud
https://apps.nextcloud.com/apps/fulltextsearch
GNU Affero General Public License v3.0
215 stars 51 forks source link

Fulltextsearch not indexing content on shared folders #420

Closed nicolacontu closed 5 years ago

nicolacontu commented 5 years ago

Hello, I am on Nextcloud 14.04 and have installed Fulltextsearch apps

Full text search 1.1.0
Full text search - Elasticsearch Platform 1.0.2
Full text search - Files 1.1.1
Full text search - Files - Tesseract OCR 1.0.0

Our instance is connected to AD and users are added via AD group. We then have shared folders connected to internal Groups and shared by admin with read/write perms to other users.

I tried to follow the installation procedure and successfully indexed files for all users. The problem is that is not indexing content for files inside shared folders. We are using OnlyOffice to modify documents.

Fulltextsearch can look for file names, can look content only for files saved on root (even if they have been created with OnlyOffice) but can't with files on shared folders.

I see this while running the index command :

│ Error: 1/1 │ Index: files:4879 │ Exception: Elasticsearch\Common\Exceptions\ServerErrorResponseException │ Message: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [content] not present as part of path [attachment.content]

Is this a limit of this app? Is there any trick to make it working? I think this is related to https://github.com/nextcloud/fulltextsearch/issues/365

I tried running the index with errors/reset and this is what I see for one file

[root@cmd-dev1 nextcloud]# sudo -u apache /usr/local/php7.1.10/bin/php ./occ fulltextsearch:document:platform files 178729 { "document": { "id": "178729", "providerId": "files", "access": { "ownerId": "464213F7-9C18-4BAD-A08E-330049EB1755", "viewerId": null, "users": [], "groups": [], "circles": [], "links": [] }, "modifiedTime": 0, "title": "Notes\/ Billing.docx", "link": "", "index": null, "source": "files_local", "info": [], "hash": "0eeef817b6fcba2c825029c44eed7c5f", "contentSize": 0, "tags": [], "metatags": [ "files_local" ], "subtags": [], "more": [], "excerpts": [], "score": null } }

Thanks a lot

ArtificialOwl commented 5 years ago

not sure this issue is still affecting you but could you try to ./occ fulltextsearch:test first ?

budachst commented 5 years ago

We are also running NC 14.0.4 with the latest FTS available for that version and we don't experience these issues. That is, all contents inside shared folders are indexed and retrieveable via the FTS.

The error you're noted has also been shown up occasionally while we indexed our 7TB of data and I assumed that there were some attachments, which either couldn't be sent to the index or ES/TIKA couldn't parse. Other than that, our index works very good in NC 14.0.4

nicolacontu commented 5 years ago

I tried with test first

[root@cmd-dev1 nextcloud]# sudo -u apache /usr/local/php7.1.23/bin/php ./occ fulltextsearch:test

.Testing your current setup: Creating mocked content provider. ok Testing mocked provider: get indexable documents. (2 items) ok Loading search platform. (Elasticsearch) ok Testing search platform. ok Locking process ok Removing test. ok Pausing 3 seconds 1 2 3 ok Initializing index mapping. ok Indexing generated documents. ok Pausing 3 seconds 1 2 3 ok Retreiving content from a big index (license). (size: 32386) ok Comparing document with source. ok Searching basic keywords:

Then doing an index I get a lot of :

┌─ Errors ──── │ Error: 141/141 │ Index: files:4145 │ Exception: Elasticsearch\Common\Exceptions\ServerErrorResponseException │ Message: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [content] not present as part of path [attachment.content]

The files have been uploaded with the admin user (NOT LDAP) and we are getting in with ldap users.

Is there anything else I can do?

budachst commented 5 years ago

That would be the only thing, which is different, as our data is only uploaded by our LDAP users. However, if the files are shared via a groupfolder, that shouldn't matter, afaik.

Didn't you try uploading some data as a regular user and check, if it gets properly indexed?

nicolacontu commented 5 years ago

I did try deleting all existing documents, upload part of them with my ldap user via nextcloud client, sharing folder, drop index, re-index. Same result.

How did you upload files?

I also upgraded NC to 15.0.2. Same issue.

budachst commented 5 years ago

We uploaded our data via the web und the sync clients. However, that is not relevant. Please perform a full file scan using occ and perform an index remove and reindex afterwards. We need to make sure, that there's no stale data hanging around anywhere in the data base or the index.

ghost commented 5 years ago

Hi @daita ,

same issue here:

nc 15.0.2, fulltextsearch 1.2.3 a document uploaded to a folder, shared by another user appears in the console-output of fulltextsearch:live and ist found by its name when searching, but no content of the document can be found.

fulltextsearch:live gives the same error as for @nicolacontu:

┌─ Errors ────
│ Error:     14/14
│ Index: files:11491
│ Exception: Elasticsearch\Common\Exceptions\ServerErrorResponseException
│ Message: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [content] not present as part of path [attachment.content]
│
│
└──

fulltextsearch:test gives no errors, everything okay.

For me, the error from fulltextsearch:live corresponds to not working fulltextscan of files in shared folders.

groupfolders btw. work without any problem (but not for my scenario).

Thanks for Your work !

nicolacontu commented 5 years ago

So, I did some improvement with the files:scan --all.

I did this : ./occ fulltextsearch:reset ./occ files:scan --all ./occ fulltextsearch:reset ./occ fulltextsearch:index

Now I see the content of PDF files and name of documents. Can't still index content of docx.

I also tested with setting onlyoffice on the production server but no luck.

ArtificialOwl commented 5 years ago

Shared folder is a groupfolder, or a simple folder from a user shared to a group ?

nicolacontu commented 5 years ago

In my case it is a simple folder created by a user and shared to a group.

ghost commented 5 years ago

Shared Folder means a simple Folder, shared by a user.

Example for my constellation (same as @nicolacontu ):

USER01 has TESTFOLDER in his cloud-root USER01 shares TESTFOLDER with group TESTGROUP USER02 and USER03 are members of TESTGROUP USER02 uploads FILE.pdf to TESTFOLDER, shared from USER01 (all the time fulltextsearch:live is running) neither USER02 or USER03 nor USER01 are able to find words written in FILE.pdf, just if You search for "FILE" FILE.pdf is shown as a result

ArtificialOwl commented 5 years ago

Thanks for this detailed setup

ghost commented 5 years ago

Thanks for helping us getting fulltextsearch to full work again!

ArtificialOwl commented 5 years ago

If you have the ID of a file that is not searchable by a member of a group, can you please check the result of:

./occ fulltextsearch:document:platform files fileId ./occ fulltextsearch:document:provider ownerId files fileId

(replace ownerId and fileId by the owner of the file, and the file Id)

You should see something like this, with the name of the group you're sharing to:

maxence@stealth:~/sites/nc15/nextcloud$ ./occ full:doc:plat files 223
{
    "document": {
        "id": "223",
        "providerId": "files",
        "access": {
            "ownerId": "cult",
            "viewerId": "",
            "users": [
                "test1"
            ],
            "groups": [
                "group1"
            ],
            "circles": [],
            "links": []
        },
        "modifiedTime": 0,
        "title": "test1\/testfile1.txt",
        "link": "",
        "index": null,
        "source": "files_local",
        "info": [],
        "hash": "6f734362029ff5d281c76d3208e007b3",
        "contentSize": 7,
        "tags": [],
        "metatags": [
            "files_local"
        ],
        "subtags": [],
        "more": [],
        "excerpts": [],
        "score": ""
    }
}
maxence@stealth:~/sites/nc15/nextcloud$ ./occ full:doc:prov cult files 223
Document: 
{
    "id": "223",
    "providerId": "files",
    "access": {
        "ownerId": "cult",
        "viewerId": "",
        "users": [
            "test1"
        ],
        "groups": [
            "group1"
        ],
        "circles": [],
        "links": []
    },
    "modifiedTime": 1548846459,
    "title": "test1\/testfile1.txt",
    "link": "",
    "index": {
        "ownerId": "cult",
        "providerId": "files",
        "source": "files_local",
        "documentId": "223",
        "lastIndex": 0,
        "errors": [],
        "errorCount": 0,
        "status": 12,
        "options": {
            "_files_local": "1"
        }
    },
    "source": "files_local",
    "info": {
        "share_names": {
            "cult": "test1\/testfile1.txt"
        }
    },
    "hash": "",
    "contentSize": 12,
    "tags": [],
    "metatags": [
        "files_local"
    ],
    "subtags": [],
    "more": [],
    "excerpts": [],
    "score": ""
}
nicolacontu commented 5 years ago

Here you go

[root@cmd-dev1 nextcloud]# sudo -u apache /usr/local/php7.1.23/bin/php ./occ fulltextsearch:document:platform files 612493
{
    "document": {
        "id": "612493",
        "providerId": "files",
        "access": {
            "ownerId": "F69ACDAE-DDEC-41A3-AFEA-C813B29A0DBB",
            "viewerId": "",
            "users": [],
            "groups": [],
            "circles": [],
            "links": []
        },
        "modifiedTime": 0,
        "title": "CMD System Admin\/CMD Architecure\/CMD Server list.docx",
        "link": "",
        "index": null,
        "source": "files_local",
        "info": [],
        "hash": "0d12fa83f4239cc22b5045140cbe9c00",
        "contentSize": 0,
        "tags": [],
        "metatags": [
            "files_local"
        ],
        "subtags": [],
        "more": [],
        "excerpts": [],
        "score": ""
    }
}
[root@cmd-dev1 nextcloud]# sudo -u apache /usr/local/php7.1.23/bin/php ./occ fulltextsearch:document:provider F69ACDAE-DDEC-41A3-AFEA-C813B29A0DBB files 612493
Document:
{
    "id": "612493",
    "providerId": "files",
    "access": {
        "ownerId": "F69ACDAE-DDEC-41A3-AFEA-C813B29A0DBB",
        "viewerId": "",
        "users": [],
        "groups": [],
        "circles": [],
        "links": []
    },
    "modifiedTime": 1545916804,
    "title": "CMD System Admin\/CMD Architecure\/CMD Server list.docx",
    "link": "",
    "index": {
        "ownerId": "F69ACDAE-DDEC-41A3-AFEA-C813B29A0DBB",
        "providerId": "files",
        "source": "files_local",
        "documentId": "612493",
        "lastIndex": 0,
        "errors": [],
        "errorCount": 0,
        "status": 12,
        "options": {
            "_files_office": "1",
            "_files_local": "1"
        }
    },
    "source": "files_local",
    "info": {
        "share_names": {
            "F69ACDAE-DDEC-41A3-AFEA-C813B29A0DBB": "CMD System Admin\/CMD Architecure\/CMD Server list.docx"
        }
    },
    "hash": "",
    "contentSize": 20656,
    "tags": [],
    "metatags": [
        "files_local"
    ],
    "subtags": [],
    "more": [],
    "excerpts": [],
    "score": ""
}
ArtificialOwl commented 5 years ago

well, looks like I found the issue. It seems that rights are not updated during the :live. If you edit the file, the rights will be updated and the search should works.

Thanks for your report

nicolacontu commented 5 years ago

I was playing a bit and the folder was not shared. I shared it again and running the command I see :

[root@cmd-dev1 nextcloud]# sudo -u apache /usr/local/php7.1.23/bin/php ./occ fulltextsearch:document:provider F69ACDAE-DDEC-41A3-AFEA-C813B29A0DBB files 612493 Document: { "id": "612493", "providerId": "files", "access": { "ownerId": "F69ACDAE-DDEC-41A3-AFEA-C813B29A0DBB", "viewerId": "", "users": [], "groups": [ "Software Group" ], "circles": [], "links": [] }, "modifiedTime": 1545916804, "title": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "link": "", "index": { "ownerId": "F69ACDAE-DDEC-41A3-AFEA-C813B29A0DBB", "providerId": "files", "source": "files_local", "documentId": "612493", "lastIndex": 0, "errors": [], "errorCount": 0, "status": 12, "options": { "_files_office": "1", "_files_local": "1" } }, "source": "files_local", "info": { "share_names": { "F69ACDAE-DDEC-41A3-AFEA-C813B29A0DBB": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "1BD2CE06-910B-48D6-9C7A-CB3A5D98361A": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "1C027758-C373-4F0A-AF12-2E3FA74F4C05": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "1E283938-18B1-4721-902C-4961D8E13475": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "2CF009AA-2902-4737-8CBA-F137D37B742D": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "3CDA6FD2-5D23-4F2D-84FD-3243E2B6DA55": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "4A0FEC39-B8EE-443A-86CB-F85073CBA554": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "4C4FBBF9-F908-4C07-BE80-4D17DF5E453A": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "5C4E7D66-31F4-4C9D-A915-09FFA4C99D63": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "5E562688-6FF2-4B6C-AD80-0F2A8E9A5FA7": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "5FCD0527-8B40-445B-BF5B-3AB1F8A678D3": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "6BAC0E39-7D2D-4077-A6BB-2A517FB4E47D": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "6DD9C1CF-3744-4651-B1A8-C120D19848E6": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "8B681FF3-0B30-431D-A386-50F160358C1C": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "8F5F3492-8678-4E77-A31B-94CDDB0D5BE0": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "9AEC46D2-B484-4AEF-91EA-45CF61DEDBA0": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "9BC9124C-CD93-4DB8-BE5E-59A36AC8FFE0": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "9CEAEA00-38AA-4EAC-9FEB-9C3641EDEA28": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "11B320A3-C3BE-4384-A2B5-0BABEE815643": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "25A06897-88E2-4D4D-B04C-28140576DD4C": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "27E3E755-D444-4DFB-A311-3BFFC5F19038": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "28CF7A33-62D6-476C-858E-5111B0E75F0D": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "30FAAF1E-6BB4-48EB-B441-4AB6B9679BA0": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "31DFDB6C-5370-4B0A-99AC-1165BE1B00C4": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "032FCCD1-451D-4843-9174-9B1D33D6C5A7": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "45AC94B5-DCD5-46C6-A1D2-70569986D47D": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "45B4FF3A-1EDA-4E12-9F0D-0798071D80E8": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "53B2FCDA-58C3-4005-8EBC-C48D2A79A71E": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "55C6B367-381E-4D1B-B08D-7320D0CD785D": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "58F047B1-0B8E-4D9C-904E-F72ACDF4DF25": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "086D7FD1-6627-4D14-873B-579D6AF13ABB": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "95C4F715-C9C4-4A58-9292-E7D0EEE491FB": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "98BC4A51-AFB2-47EC-8FE6-D6B11CC06419": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "382F9BB6-DBC4-4CBC-B190-57C26BBB4E79": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "886E15E2-D29B-4FD5-A90D-82E10EC6C014": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "929A5508-919A-477A-B759-7CF6D30A61FD": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "8234EBBD-DDF8-431C-8F13-2DB9E969588F": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "464213F7-9C18-4BAD-A08E-330049EB1755": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "3500729C-C444-487C-BB68-5F26C9B8A61A": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "79684947-2816-403A-A712-1DAD4A9808B6": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "A4AA57D1-4A1C-429F-894E-2C52D2253250": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "A84A82D7-7205-475E-845A-2150B1BB8ACA": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "A87DC5F6-64BE-4558-B5A8-BC5A76702FF7": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "A100C9B7-4614-4B60-AA8E-452E01DFFB65": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "B0C94B48-06ED-477B-90C1-C146C71D27C9": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "B1A889F5-6D8D-41B0-AE65-B4BF1326563E": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "B4DCF5D9-11BB-4F4F-8ACD-438DF3F2135E": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "B1408E65-5AC2-4963-A457-0E7D5BE69F41": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "C63D34CE-2828-43F6-A356-13DFA012D8D2": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "CC63F425-E02E-49EF-B11C-3CB9C451A3D6": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "CD4181FA-D62B-4823-A4F5-DFDBEB203DAE": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "CE2FB7FA-E266-4D37-83C7-AC715CA81BB4": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "D02418A3-1D7B-48B6-9644-B87A8131C6CD": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "D02621BC-CB65-453C-957E-05231823874C": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "D67A4CBA-3605-42B5-B93F-0BC8A3ABE6E6": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "DECCC0E0-E34D-4EB7-A074-04E1CF74B464": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "EB6E4E46-E9DF-4CCE-9D1B-F923A81B05C0": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "EB8AA270-EC56-4A3E-B662-A617DF0A87A9": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "ED55B15A-5074-41B4-A946-B998A377929F": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "F1D7D555-605B-48FC-8BC1-E4986EAE995C": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "F8E9579B-5A85-4396-A6E7-26A6868F048B": "CMD System Admin\/CMD Architecure\/CMD Server list.docx", "F63D03D8-D2D0-40A7-A4EA-0C2095C482AE": "CMD System Admin\/CMD Architecure\/CMD Server list.docx" } }, "hash": "", "contentSize": 20656, "tags": [], "metatags": [ "files_local" ], "subtags": [], "more": [], "excerpts": [], "score": "" }

ghost commented 5 years ago

Not shure if it should be fixed with fulltextsearch 1.2.4 fulltextsearch - Elasticsearch Platform 1.2.3 fulltextsearch - Files 1.2.4

but a

php -f occ fulltextsearch:stop
php -f occ fulltextsearch:reset
php -f occ fulltextsearch:index

still gives the same error

┌─ Errors ────
│ Error:     14/14
│ Index: files:10693
│ Exception: Elasticsearch\Common\Exceptions\ServerErrorResponseException
│ Message: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [content] not present as part of path [attachment.content]
│
│
└──

and documents in shared folders are not fulltext indexed :-(

ArtificialOwl commented 5 years ago

can you please:

php -f occ fulltextsearch:stop
php -f occ fulltextsearch:reset
php -f occ fulltextsearch:test
php -f occ fulltextsearch:index

instead, and create a new issue if you still have a problem, as this is not related to the current report

ghost commented 5 years ago

Hi @daita,

with the additional :test I get the same result as before.

If You want, I can create a new issue, but IMHO it's still the same problem I explained in this issue yesterday...

nicolacontu commented 5 years ago

I'm sorry @daita but even with :live and then editing one file, I'm not able to look for the content of that file. I don't think that is the issue.

ArtificialOwl commented 5 years ago

did you upgrade to today's release, and restart the :live command after the upgrade ?

ghost commented 5 years ago

Hi @daita, for me: yes, I did so:

php -f occ fulltextsearch:stop
php -f occ fulltextsearch:reset
php -f occ fulltextsearch:test
php -f occ fulltextsearch:index

a php -f occ fulltextsearch:live afterwards does not fulltext-index new documents in folders shared by a user, too

nicolacontu commented 5 years ago

yes, even with the newest version it is not working. Same commands as @team-a2 but a files:scan --all before test, index and live.

Not sure what's wrong

ArtificialOwl commented 5 years ago

do you have the error 'field [content] not present as part of path [attachment.content]' on all your files ? If not:

ghost commented 5 years ago

Hi @daita,

(curiously, if I drop a PDF in a folder shared FROM another user, even the file is not fulltext-indexed for the owner of the shared folder (with the same error))

ArtificialOwl commented 5 years ago

is your shared folder on a specific filesystem, or is it still in local ?

ghost commented 5 years ago

everything local

ArtificialOwl commented 5 years ago

also, if you drop a text file instead, is it working fine ?

ArtificialOwl commented 5 years ago

would it be possible to have one of those PDF that is failing by mail: maxence@nextcloud.com ?

nicolacontu commented 5 years ago

`[root@cmd-dev1 nextcloud]# sudo -u apache /usr/local/php7.1.23/bin/php ./occ fulltextsearch:live

Memory: 37 MB ┌─ Indexing ──── │ Action: waiting │ Provider: Files Account: F69ACDAE-DDEC-41A3-AFEA-C813B29A0DBB │ Document: 612431 │ Info: application/vnd.openxmlformats-officedocument.wordprocessingml.document │ Title: CMD System Admin/Info About This Folder.docx │ Content size: 15868 └── ┌─ Results ──── │ Result: 2/2 │ Index: files:612431 │ Status: ok │ Message: {"_index":"nextcloud","_type":"standard","_id":"files:612431","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_s │ eq_no":123,"_primary_term":1} │ └── ┌─ Errors ──── │ Error: 62/62 │ Index: files:612431 │ Exception: Elasticsearch\Common\Exceptions\ServerErrorResponseException │ Message: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [content] not present as part of path [attachment.content] │ ` I edited a docx with onlyoffice. Only office points to an external server, is that a problem?

I tried also with a text doc not shared in the root. Same execption.

ArtificialOwl commented 5 years ago

@team-a2 : ok, was able to reproduce the issue, dropping a file in a shared folder does not index content. PDF or text. (no need to send any file)

ghost commented 5 years ago

okay. TXT doesn't work for me either.

ArtificialOwl commented 5 years ago

@team-a2 : can you try this patch ?

ArtificialOwl commented 5 years ago

@nicolacontu - your issue is different:

maxence@nextcloud.com

nicolacontu commented 5 years ago

Yes all office files. Sent by email

[root@STAGING-CMD2 bin]# ./elasticsearch-plugin list ingest-attachment WARNING: plugin [ingest-attachment] was built for Elasticsearch version 6.3.2 but version 6.4.2 is required

ArtificialOwl commented 5 years ago

oh, please upgrade your ingest-attachment plugin:

./elasticsearch-plugin remove ingest-attachment
./elasticsearch-plugin install ingest-attachment

would do the trick

nicolacontu commented 5 years ago

Ok so now, I'm able to get content of root files. But I discovered one of the problems :

[root@cmd-dev1 nextcloud]# sudo -u apache /usr/local/php7.1.23/bin/php ./occ full:doc:plat files 612431 --content { "document": { "id": "612431", "providerId": "files", "access": { "ownerId": "F69ACDAE-DDEC-41A3-AFEA-C813B29A0DBB", "viewerId": "", "users": [], "groups": [ "Software Group" ], "circles": [], "links": [] }, "modifiedTime": 0, "title": "CMD System Admin\/Info About This Folder.docx", "link": "", "index": null, "source": "files_local", "info": [], "hash": "53e3af217a46033cb1c0b91a3c39a93d", "contentSize": 947, "tags": [], "metatags": [ "files_local" ], "subtags": [], "more": [], "excerpts": [], "score": "" }, "content": "CMD SysAdmin folder\nThis file will help you understanding where you can find everything on this folder and how it is organized.\nAdmin support tasks \nInto this folder you can find info about admin supp"

The content seems dropped. Is there any limit on the content size? Is that based on elastichsearch?

nicolacontu commented 5 years ago

The other question is. Is there a way to get to content indexed for all files at once? For some of them, I need to edit them and then I am able to look for them.

ArtificialOwl commented 5 years ago

The command only displays the first n chars of the content, you can see the size of the stored content in elasticsearch there "contentSize": 947

I will release a new version and see if I can find a way to reindexes missing content

nicolacontu commented 5 years ago

Thanks a lot for your help

ghost commented 5 years ago

Hi @daita,

back from the weekend ;-)

Thanks for fulltextsearch-files 1.2.5 - everything seems to work now. No more errors for me!

Great work!