nextcloud / fulltextsearch_elasticsearch

🔍 Use Elasticsearch to index the content of your Nextcloud
GNU Affero General Public License v3.0
80 stars 29 forks source link

No search results or next result page left blank #299

Open ShinjiLE opened 12 months ago

ShinjiLE commented 12 months ago

I have experienced this problem with big PDFs . If the search find some of this documents the result page shows no results on the first occurrence of such an item . The log shows :

{"reqId":"qVkvl0XZoLw5yADO2rEE","level":2,"time":"2023-09-08T12:04:09+00:00","remoteAddr":"xxx","user":"xxx","app":"no app in context","method":"GET","url":"/apps/fulltextsearch/v1/search?request=%7B%22providers%22%3A%22all%22%2C%22options%22%3A%7B%22files_local%22%3A%220%22%2C%22files_external%22%3A%220%22%2C%22files_group_folders%22%3A%220%22%2C%22files_extension%22%3A%22%22%7D%2C%22search%22%3A%22in%3Acontent%20welton%22%2C%22page%22%3A1%7D","message":"500 - {\"request\":{\"providers\":[\"all\"],\"author\":\"xxx\",\"search\":\"welton\",\"empty_search\":false,\"page\":1,\"size\":10,\"parts\":[\"comments\",\"ocr\"],\"queries\":[],\"options\":{\"files_local\":\"0\",\"files_external\":\"0\",\"files_group_folders\":\"0\",\"files_extension\":\"\",\"in\":[\"content\"]},\"metatags\":[],\"subtags\":[],\"tags\":[]},\"version\":\"27.0.1\",\"status\":-1,\"exception\":\"Elastic\\\\Elasticsearch\\\\Exception\\\\ClientResponseException\",\"message\":\"400 Bad Request: {\\\"error\\\":{\\\"root_cause\\\":[{\\\"type\\\":\\\"illegal_argument_exception\\\",\\\"reason\\\":\\\"The length [5818326] of field [content] in doc[51499]\\/index[nextcloud] exceeds the [index.highlight.max_analyzed_offset] limit [1000000]. To avoid this error, set the query parameter [max_analyzed_offset] to a value less than index setting [1000000] and this will tolerate long field values by truncating them.\\\"}],\\\"type\\\":\\\"search_phase_execution_exception\\\",\\\"reason\\\":\\\"all shards failed\\\",\\\"phase\\\":\\\"query\\\",\\\"grouped\\\":true,\\\"failed_shards\\\":[{\\\"shard\\\":0,\\\"index\\\":\\\"nextcloud\\\",\\\"node\\\":\\\"mJUnzuXLS0uQ9WUS--TZVw\\\",\\\"reason\\\":{\\\"type\\\":\\\"illegal_argument_exception\\\",\\\"reason\\\":\\\"The length [5818326] of field [content] in doc[51499]\\/index[nextcloud] exceeds the [index.highlight.max_analyzed_offset] limit [1000000]. To avoid this error, set the query parameter [max_analyzed_offset] to a value less than index setting [1000000] and this will tolerate long field values by truncating them.\\\"}}],\\\"caused_by\\\":{\\\"type\\\":\\\"illegal_argument_exception\\\",\\\"reason\\\":\\\"The length [5818326] of field [content] in doc[51499]\\/index[nextcloud] exceeds the [index.highlight.max_analyzed_offset] limit [1000000]. To avoid this error, set the query parameter [max_analyzed_offset] to a value less than index setting [1000000] and this will tolerate long field values by truncating them.\\\",\\\"caused_by\\\":{\\\"type\\\":\\\"illegal_argument_exception\\\",\\\"reason\\\":\\\"The length [5818326] of field [content] in doc[51499]\\/index[nextcloud] exceeds the [index.highlight.max_analyzed_offset] limit [1000000]. To avoid this error, set the query parameter [max_analyzed_offset] to a value less than index setting [1000000] and this will tolerate long field values by truncating them.\\\"}}},\\\"status\\\":400}\"}","userAgent":"Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/117.0","version":"27.1.0.4","data":[]}

I solved this problem by adding'max_analyzed_offset' => '999999', in SearchMappingService.php at private function generateSearchHighlighting(ISearchRequest $request): array above 'fields' => $fields,

it25fg commented 12 months ago

Good to know somebody ~solved it~ worked around the problem. I did it the other way round: I increased the limit in the index configuration:

curl -XPUT http://127.0.0.1:9200/${INDEX}/_settings -H 'Content-Type: application/json' -d '{"index":{"highlight.max_analyzed_offset":1000000000}}'

This has the benefit that it's no source modification: Nextclouds self-diagnostic does not complain about integrity check failure, and the next update won't overwrite the change.

But, in the long run, both solutions are no real ones. I remember a time (maybe this was before elasticsearch, don't really remember) where there was a setting in the Fulltextsearch app: do not index documents above a given limit. Maybe this setting is deemed outdated, given the sheer sizes of nowadays' documents. But the correct solution would be that the size of documents to be indexed and the size limits of documents to be searched should correspond somehow.

jakeh999 commented 11 months ago

I'm getting 0 search results in version 27.0.4, but no errors whatsoever in the browser console or Nextcloud log. I've tried tweaking both settings by editing SearchMappingService.php as well as highlight.max_analyzed_offset in the Elasticsearch configuration, but to no avail. Reverting the plugin to version 27.0.2 returns the search results normally. I wish I could be more of help, but there seems to be something more going on with the regression.

jakeh999 commented 10 months ago

Update: in addition to increasing max_analyzed_offset, resetting and then reindexing ultimately fixed it for me. Perhaps something was corrupted in the index that didn't cause an issue in 27.0.2 but did 27.04.