kitodo / kitodo-presentation

Kitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Library Suite.
https://kitodo.org
GNU General Public License v3.0
38 stars 44 forks source link

[FUND] Autocompletion breaks after first token #825

Open michaelkubina opened 2 years ago

michaelkubina commented 2 years ago

Description

The autocomplete feature (suggest-component) stops working after the first written word/token and stays the same no matter what. This way the benefit of the autocomplete feature is very limited, as we only see the top 10 entries for the first word match and not the whole search string.

Reproduction

Steps to reproduce the behaviour:

  1. Write a search string into the search field (metadata-search/simple search)
  2. As long as the string is still the first word, autocompletion updates...
  3. ...as soon as we enter a whitespace or delimiter, the autocompletion stops updating
  4. we now only see the same top 10 all the time, even though we still write out our search string

first_word second_word_wrong

Expected Behavior

The autocomplete feature should not halt after the first word, but instead update the suggestions according to the string entered.

second_word more_words_wrong

Solution

The issue is "easy" to fix, as we need to just change two thing - the schema.xml and how we query the /suggest Query-Handler (SearchSuggest.php):

schema.xml

We need to change the analyzerchain for the autocomplete field.

Currently we have the following, which tries to match multiple tokens (at query time) against a single-token-string field (KeywordTokenizerFactory holds the whole string as a single token):

<analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>

But instead we need to match a single-token-string against another single-token-string. The best would be to remove all punctuations, because otherwise our users must write them out accordingly, which will likely cause a lot of irritations. So the best would be the following analyzer, which normalizes both query and index to a single-token-string without punctuation and solely lower case:

<analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <!-- remove all punctuations -->
    <filter class="solr.PatternReplaceFilterFactory" pattern="([^\p{L}\p{M}\p{N}\p{Cs}]*[\p{L}\p{M}\p{N}\p{Cs}\_]+:)|([^\p{L}\p{M}\p{N}\p{Cs}])+" replacement=" " replace="all"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.TrimFilterFactory"/>
</analyzer>

SearchSuggest.php

Then we need to realize, that the /suggest Query-Handler (which actually is a spellchecker-component! -> using this template: https://cwiki.apache.org/confluence/display/solr/Suggester/) must be queried through the spellcheck.q query and not the q query.

This happens in this line in the SearchSuggest.php: https://github.com/kitodo/kitodo-presentation/blob/e2d2d862a67b0b2ab73c7f93939b23375d43c0d3/Classes/Eid/SearchSuggest.php#L56

As i am not that familiar with solarium (yet!) i dont know if it's enough to change the parameter from "q" to "spellcheck.q" or if we need to create the query in another way (because we must make use of the spellchek component and activate it as well!). I cant test it in our dev-system either at this point, but from a solr point of view this solution works, as demonstrated in the screenshots below:

SOLR production-server with old schema.xml queried via q

Does not work! solr_prod_q

SOLR production-server with old schema.xml queried via spellcheck.q and active spellchecker component

Does not work! solr_prod_spellcheck_q

SOLR development-server with new (proposed) schema.xml queried via q

Does not work! solr_dev_q

SOLR development-server with new (proposed) schema.xml queried via spellcheck.q and active spellchecker component

WORKS! solr_dev_spellcheck_q

michaelkubina commented 2 years ago

This issue is linked to the SOLR-Improvements issue, where the suggester-component/autocomplete feature is discussed as well:

454

sebastian-meyer commented 1 month ago

PR #1289 reworked the suggester based on the Solr suggester component. @michaelkubina, could you please have a look if this fixes your issue as well?