vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.78k stars 604 forks source link

Support retrieving matched elements in an array of strings #26636

Open danitico opened 1 year ago

danitico commented 1 year ago

Is your feature request related to a problem? Please describe. When we want to search in a array of strings (imagine a list of paragraphs) and setting the summary to dynamic, we receive the whole list of paragraphs where the matched ones have the <hi> tags to highlight the set of words queried. If this list of paragraphs is small , it won't be a problem to receive everything and then "clean" the json response.

However, if we have a huge list of paragraphs, the previous "solution" won't scale.

Describe the solution you'd like We would like to define a new type of summary in which we would like to retrieve the paragraphs that matched with our query. Furthermore, it will be useful if we can combine this feature with the dynamic summary feature, so we can have the same features as in a string field.

Describe alternatives you've considered The key of this feature is what vespa thinks when we are talking about a match. As @jobergum says, vespa has several ways to match. The current implementation is for arrays of struct and maps, where vespa looks for exact matches of values using the sameElement operator.

I would recommend to take into account the match property defined on that field to "erase" the unimportant paragraphs of the summary

Additional context

jobergum commented 1 year ago

Considering standard match:text for the array, is it enough that at least one of the query terms searching the array matches? E.g., query='what is the sun made of' would match many paragraphs as the query contains frequent words.

danitico commented 1 year ago

@jobergum Yeah! That's it