higher score to shorter strings

lewang / flx

Fuzzy matching for Emacs ... a la Sublime Text.

GNU General Public License v3.0

519 stars 37 forks source link

higher score to shorter strings #90

Closed fommil closed 6 years ago

fommil commented 8 years ago

I'd be interested in tuning the algorithm to give a slightly higher score to shorter strings that match (or conversely, penalising longer strings).

Where would I start?

Typicaly usecase is that when I search for something such as ss intending to match a file called SearchService.scala in my codebase, the hits for SearchServiceSpec.scala and SearchServiceFixture.scala are coming up first.

ibrahima commented 8 years ago

I also notice this a lot in my Rails projects, this would be a pretty good change to make I think. I guess with any type of fuzzy weighting system sometimes it's hard to predict how changes to the weights will behave in practice though...

jtbm37 commented 7 years ago

This would be very much appreciated. Here is a very obvious example: (flx-score "StopMonitoringSubscriptionProcessorTests" "ms") gives (159 4 14) (flx-score "mstest.log" "ms") gives (140 0 1)

This would place StopMonitoringSubscriptionProcessorTests before mstest.log. While typing ms, I am looking for the candidate mstest.log.

oscarfv commented 7 years ago

@jtbm37 : I think that your example is unrelated to the original report. What you want is prefix (or substring) matching which is precisely what flx is not about. There are packages that offer a matching system similar to what you want.

In your example, I want to see StopMonitoringSubscriptionProcessorTests placed before mstest.log. That's what I expect as a long time flx user.

jtbm37 commented 7 years ago

which is precisely what flx is not about

@oscarfv I can only agree with that statement since after re-reading the package description it says clearly The longer the substring match, the higher it scores. This maps well to how we think about matching.

I will look for another sorting algorithm.

fommil commented 7 years ago

I think the example is exactly what my original post was about.

oscarfv commented 7 years ago

@fommll: are you sure? jtbm37's example is about substring matching (as he acknowledged on his last follow-up) while yours is about candidate length sorting (to which the rule quoted by jtbm37 also applies, although at times it seems counter-intuitive, but this is a problem with "magic" algorithms in general: when they give a result that you don't like, you end thinking that something is wrong, while not realizing that it is impossible to do all the time what every user expects.)

jtbm37 commented 7 years ago

In my example I wanted to highlight the fact that the shorter the distance between 'm' and 's' is the higher it should score. I was not looking for a substring match.

The distance between m and s in mstest.log is 0. The distance between m and s in StopMonitoringSubscriptionProcessorTests is 9 (or 12, not sure which s is considered by the current algo)

If I change mstest.log to mtest.log increasing the distance from 0 to 2 it is still shown after StopMonitoringSubscriptionProcessorTests

fommil commented 7 years ago

@oscarfv yes, I am sure. I want an option for shorter strings to weigh more, it's not about substring matching.