mongoid / mongoid_fulltext

An n-gram-based full-text search implementation for the Mongoid ODM.
MIT License
150 stars 66 forks source link

Full search term match scoring #46

Closed roykolak closed 3 years ago

roykolak commented 3 years ago

Here's a situation that we are experiencing after integration...

We have indexed roughly 3 million companies by name and domain.

Searching for "Tesla" produces results for lots of companies with "Tes" as the first word, but does not include the company "Tesla" even though that full word ngram can be found in our search index. We assumed that because the full search term could be matched to an ngram, that result would be scored the highest.

Does that sound right/wrong? Is there a configuration option that we missed?

roykolak commented 3 years ago

Relevant quote from the scoring section...

If an entire word in your query matches an entire word that's indexed and you have the index_full_words option turned on (it's turned on by default), you can expect a score of at least 2 for the match.

dblock commented 3 years ago

I wonder whether turning off apply_prefix_scoring_to_all_words would solve this?

roykolak commented 3 years ago

After playing with config options, we discovered that increasing the max_results (or I should say the "candidate set") produced the correct items we expected to see. However, this resulted in a slower query, as expected. 😭

Our setup is one search index for two fields, name and domain, with around 75,000,000 ngrams. We are experimenting with creating two separate indexes with the hope that we (1) will not need have such a large candidate set and (2) performance will just be faster due to cutting the ngram amount in half.

dblock commented 3 years ago

I really think you should go for ElasticSearch :)