vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.47k stars 584 forks source link

Stemmed terms option in vespa semantic rules language #20388

Closed romanhwix closed 2 years ago

romanhwix commented 2 years ago

Is your feature request related to a problem? Please describe. It is tedious and error prone to explicitly use synonyms to define all possible conjugations, declensions, and inflections of a word or of a compound word.

Describe the solution you'd like I want to have option in vespa semantic rules language which inferences stemmed form of compound words automatically. For instance: "criminal boys" becomes "crimin boy".

I want to use it in scope with this issue https://github.com/vespa-engine/vespa/issues/20386 It allows to implement synonym enrichment in the most short and clean way.

bratseth commented 2 years ago

Support proper stemmed matching in rule bases: https://github.com/vespa-engine/vespa/pull/20741

bratseth commented 2 years ago

Documentation: https://github.com/vespa-engine/documentation/pull/1748

bratseth commented 2 years ago

This will be available in the 7.526 release. Sorry for taking so long!

bratseth commented 2 years ago

This is on by default. Set @language(ISO 639-1 2-letter code) at the top of the rule base file to match by stem in another language than english.