Fixed relevance calculation for non-existent terms

loupe-php / loupe

A full text search engine with tokenization, stemming, typo tolerance, filters and geo support based on only PHP and SQLite.

MIT License

271 stars 8 forks source link

Fixed relevance calculation for non-existent terms #74

Closed Toflar closed 6 months ago

Toflar commented 6 months ago

It's important that we pad the term matches to the total tokens searched so that terms that do not exist in our entire index are handled with a TF-IDF of 1. Otherwise, if you'd search for "foobar" and there is not a single document with "foobar" in your index, it would not be considered in the cosine similarity giving you completely wrong results.