Sort perfect match first

xeruf commented 3 years ago

[x] I have read through the manual page (man fzf)
[x] I have the latest version of fzf - 0.24.4 (f502725)
[x] I have searched through the existing issues

Info

OS
- [x] Linux
- [ ] Mac OS X
- [ ] Windows
- [ ] Etc.
Shell
- [ ] bash
- [x] zsh
- [ ] fish

Problem / Steps to reproduce

I would expect number 2 to come out first here. My hunch is that space and minus are treated equally - whereas I think a space (or tab for that matter) is a much stronger word delimiter than a minus.

I encountered this during package search, where there is some extra info before and after the package name delimited with tabs, and thus a perfect match doesn't seem so perfect to fzf.

junegunn commented 3 years ago

Mine gives dragon as the top match. You have something in your $FZF_DEFAULT_OPTS?

xeruf commented 3 years ago

Sorry, you are right:

printf 'fire-dragon 1\ndragon 2' | fzf --tiebreak=end --height=2

even though I can now understand why the algorithm selects it, it is not intuitive. The tiebreak end is there mainly for filenames, but it shouldn't prefer a longer entry because of that. Unfortunately there is no option to weigh two tiebreakers (end and length), they can only be ordered.

junegunn commented 3 years ago

--tiebreak is unintuitive and hard to understand especially with the new algorithm so I don't recommend relying on it. I even considered removing it but I kept it mostly for backward compatibility.
fzf was designed to be a context-free text filter rather than a path picker, so I decided that it should not be particularly optimized for file paths. It's a design choice. Maybe there are other programs that work better with file paths than fzf, but that is not a problem fzf wants to solve.
Consider not using --tiebreak option. The default option (length,index) is actually quite good for file paths.
There is a nice trick to make it prefer matches at the latter part of the path. We can take advantage of the fact that most file paths end with an extension, so simply put . at the end of your query. `abc.
```
# Compare "the-end" vs. "the-end."
fzf --height 10 --query 'the-end.' << EOF
the-end-is-near/foo-bar.sh
foo-bar/the-end-is-near.sh
EOF
```

xeruf commented 3 years ago

My point here is that I use fzf for paths in the overwhelming majority, and in that case it makes sense to use end rather than length as the tiebreaker (short paths such as under /etc often aren't what I'm looking for when searching for example for an rc file), that's why I set that as default option. My fix here was to reverse the tiebreaks in that particular script.

That tip does indeed sound nice, though when searching through rc files is not particularly useful, as they rarely have an extension by convention.

The question remains whether the end-tiebreaker may be tweaked in a way that it yields the more expected output. I guess currently it goes by percentage or position from beginning, when it might just be better to count from behind, where in this example both are equal and thus the length-tiebreaker would select the correct option ;)

junegunn commented 3 years ago

when it might just be better to count from behind

We did that, and it didn't turn out great.

https://github.com/junegunn/fzf/commit/0541c0dbcf96ff40bd80cb8359191dc0fa01d83d

Sadly, there is no one-size-fits-all option. It's better to understand how the default ranking algorithm works and tweak the query string to assist this dumb filter to do a better job. fzf implements all major Readline key bindings so it's easy to move the cursor around and tweak the search terms once you get used to it.

> foo bar┃

# ALT-B
> foo ┃bar

# CTRL-U
> ┃bar

# Prepend /.
> /.┃bar

# CTRL-E
> /.bar┃

# CTRL-Y
> /.barfoo ┃

xeruf commented 3 years ago

printf 'fire-dragon\ndragon' | fzf --tiebreak=end --height=2

Just realized that even in this case entering "dragon" will select "fire-dragon", so I can't enter anything more to get a better search result.

The behavior is the same with a space:

printf 'fire dragon\ndragon' | fzf --tiebreak=end --height=2

But removing the delimiter sorts "dragon" first as soon as I enter a single 'd':

printf 'firedragon\ndragon' | fzf --tiebreak=end --height=2

So I'm assuming you are splitting on non-word characters and the matching against the terms. Let me suggest some tweaks to the algorithm, even before the tiebreakers come into play:

a split by '-' should be weaker than other non-word characters, because it often shows run-together words whereas underscore, space, tab & co show a stronger separation
the percentage of the term matched should be factored in

xeruf commented 3 years ago

We did that, and it didn't turn out great.

0541c0d

What do you mean? Did you revert it?

junegunn commented 3 years ago

I can't enter anything more to get a better search result.

dragon ^d
dragon !f

Let me suggest some tweaks to the algorithm

Thanks for the input, but I'm not interested in fine-tuning the scoring algorithm because like I said above, fzf is a text filter and the criteria may not make sense with different types of input.

What do you mean? Did you revert it?

Yes, please read the commit message.

xeruf commented 3 years ago

Let me suggest some tweaks to the algorithm

Thanks for the input, but I'm not interested in fine-tuning the scoring algorithm because like I said above, fzf is a text filter and the criteria may not make sense with different types of input.

I thought these changes would be pretty general.

If you don't want to do more tweaking, I would suggest to at least pick the issue title back up and always prefer a perfect match.

The percentage-match could also be an additional tiebreaker.

What do you mean? Did you revert it?

Yes, please read the commit message.

oh woops, I read it the other way around, sorry ^^

xeruf commented 2 years ago

I am still interested in looking into improving this - look at this: printf 'fire-dragonflame\ndragon' | fzf --tiebreak=index --height=2

I am wondering why the algorithm has to resort to the tiebreaker here at all? Very similar to my my real-world issue in #2537.

junegunn commented 2 years ago

Because, unfortunately, the current scoring algorithm doesn't have a mechanism to prefer the latter. The scores calculated by the algorithm are just the same for the two cases.

https://github.com/junegunn/fzf/blob/master/src/algo/algo.go

Not saying there isn't room for improvement. We could try fine-tuning the algorithm.

xeruf commented 2 years ago

How about taking inspiration from https://github.com/jhawthorn/fzy/blob/master/ALGORITHM.md#fzys-scoring ? I love the scoring in fzy but the features of fzf.

junegunn commented 2 years ago

Very similar to my my real-world issue in #2537

So your real problem here is that the default length tiebreak is not taking --nth into account (see). If you use the default length tiebreak, fzf will prefer dragon over fire-dragonflame because it's shorter. In the context of fuzzy matching, "the perfect match" is always the shortest one, so with length tiebreak, we can say that fzf already prefers the perfect match.

It looks like the algorithm of fzy isn't very different from that of fzf. The document you linked above is severely outdated and it describes the algorithm of fzf 0.14.0 and below. fzy doesn't have --nth or --tiebreak, so the situation isn't too different if you use the default options of fzf.

junegunn commented 2 years ago

But we could consider adding a bonus point to a character at the end of a word so for a query dragon, dragon light is preferred over dragonflight (before --tiebreak)

If I'm not mistaken, fzy isn't doing that either.

junegunn / fzf

Sort perfect match first #2285

Info

Problem / Steps to reproduce