uhh-lt / dats

Discourse Analysis Tool Suite
Apache License 2.0
17 stars 2 forks source link

highlight search query not working in huge projects (> 65536) #444

Closed bigabig closed 1 month ago

bigabig commented 1 month ago

If i search in a huge project, I encounter the following problem:

elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'failed to create query: The number of terms [111938] used in the Terms Query request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the [index.max_terms_count] index level setting.')

Further, the highlight setting is always set to true, even if no search query is provided. That should be changed.

To fix this, we have to prune the provided sdoc_ids to this max_term_count and further, only use this terms / sdoc_id filter if we are actually using filter! Right now, even if i do not specify any filter, the terms query is used. This makes everything unnecessarily slower, because we are actually searching through all documents and do not need to specify this extra condition (filtering by ALL sdoc ids in the project!).