well-typed / full-text-search

An in-memory full text search engine library. It lets you run full-text queries on a collection of your documents.
Other
47 stars 5 forks source link

Interaction between expandTransformedQueryTerm and stemming #8

Open adamgundry opened 2 years ago

adamgundry commented 2 years ago

At the moment, client code specifies how to normalise/stem a term in the query viatransformQueryTerm. When running a query, expandTransformedQueryTerm produces the list of distinct transformations of a term (for any field), then they are all looked up in the index (irrespective of which field they came from).

A consequence of this is that if any field is stemmed, the query will return documents that match stemmed terms from the query, even if the documents mention the term only in non-stemmed fields. For example, suppose our documents are users, who have a name and a biography, and we stem the biography but not the name. Now a query like "Peters" will match a user whose name is "Peter", which might be undesirable.

See also the TODO in query. I don't have a clear picture of how to resolve this, other than by simply not stemming at all in indexes where this issue might be relevant.