jeancroy / FuzzySearch

:mag: Fast autocomplete suggestion engine using approximate string matching
MIT License
194 stars 32 forks source link

how to honor the matches in short entries greater than long entries? #40

Open halukkaramete opened 2 years ago

halukkaramete commented 2 years ago

Is there a built-in option so that when experimented with it, the density of the number of matching characters ( I mean the highlighted matches in red ) is honored more compared to the lengthier entries? I think that would increase the relevancy automatically.

Example screenshot:

Here, I searched for Hajj ... ( in 1.8Mb 13,000 items file ) And the most top of the line entries ( which are below ) ended up around at 100th or so in the suggested items.

They are pretty short and bingo like matches yet plenty of long ones were preceding them.

How can I easily rise them to the top? Or at least near to the top?

Here are the winners...

Screen Shot 2022-04-11 at 11 45 09 AM

Here are the poor little ones being crashed by the winners:

Screen Shot 2022-04-11 at 11 45 26 AM

Clearly, the red density on those short ones are noticeably higher.

Especially this guy:

Screen Shot 2022-04-11 at 11 59 08 AM

What do you think Jean?

I have a feeling this has to do with your compare method.
The solution may be it, but if so, how do I create that comparison?

I use

bonus_match_start: 0.6, 
highlight_bridge_gap: 0 

Must see it in practice, but what I'm asking for could create a tremendous difference in quality especially when main topics and sub-topics are searched like in my case.

jeancroy commented 2 years ago

you can use the sorter option and combine the length and the score https://github.com/jeancroy/FuzzySearch/blob/master/src/init.js#L52

jeancroy commented 2 years ago

I'll be honest, it looks like you want to recommend short paragraph given a thematic. This library was more about find a needle in a haystack.

Rigth now machine learning as a service is ripe enough that it may interest you. See for example https://docs.microsoft.com/en-us/azure/cognitive-services/language-service/question-answering/overview

halukkaramete commented 2 years ago

For those who do not know how to do sorting based on size, using the "sorter" functionality...

Add this to your option when setting up your FuzzySearch obj.

sorter: myFunction,
and then provide this somewhere on your page

function myFunction(a, b) {

    // when 2 items are equal in score, the shorter ones will rise above the longer ones  
    // if you do not use this function, sorting is done by alpha ( which is the default) 

    var d = b.score - a.score;
    if (d !== 0) return d;
    // var d = a.item.length - b.item.length; 
    var ak = a.item.length, bk = b.item.length;
    return ak > bk ? 1 : ( ak < bk ? -1 : 0);

}
halukkaramete commented 2 years ago

I'll be honest, it looks like you want to recommend short paragraph given a thematic. This library was more about find a needle in a haystack.

That's an entirely different take. I'm ok with using your library. I will work out the json so the searches will be done in only on signal words ( excluding the English Stop words ), which are stemmed ( using Porter2 ) along with Synonyms. What I'm working on is one of a kind when it comes to this subject and I'd like to use your library. Once I launch this, it will be used by millions of people.

jeancroy commented 2 years ago

if (d !== 0) return d;

I'd use something like abs(d) < 0.1 or d*d < 0.01

The thing is the score is a float, but you may find two results are similar enough to start giving importance to overall size. I have not tested 0.1 you may find something better for your taste.

Keep the good work then, I see the subject matter is important.