junegunn / fzf

:cherry_blossom: A command-line fuzzy finder
https://junegunn.github.io/fzf/
MIT License
64.65k stars 2.38k forks source link

Taking into account the hierarchy of inputs #3255

Open grothesque opened 1 year ago

grothesque commented 1 year ago

Info

Problem / Steps to reproduce

Many thanks for this excellent tool! After using it for a while and searching the existing issues, I would like to point out a possible direction in which the matching of fzf could be improved.

Like many people, I often use fzf to filter lists of file/directory names. For example, there could be the following directories

projects/foo
projects/foo/doc
projects/foo/src
projects/foo/tests
notes/some/category/foo

When fzf is launched with the above list of choices and the user searches for “foo”, the results will be presented in the above order, i.e. the three subdirectories of projects/foo will be considered more relevant than the notes on "foo". However, it could be argued that for a hierarchy of directories and files the subdirectories of projects/foo are already covered by the match of their parent. After all, they do not provide any further reason to match.

In the above example with only five items there is no problem, but with thousands of matching items it is easy to miss top-level matches that are shown behind subitems of other top-level matches.

Is there a way to solve this issue by configuration of current fzf? If not, perhaps we could discuss here possible solutions?

dr0bz commented 1 year ago

Hi @grothesque, what you need is sorting of fzf results. It's done by fzf --tiebreak=.... In this case you should use fzf --tiebreak=end.

Just set export FZF_DEFAULT_OPTS="--tiebreak=end" in your .bashrc, .zshrc or whatever shell you are using.

man fzf excerpt: image

Best regards, dr0bz

junegunn commented 9 months ago

@dr0bz Thanks for the comment. Yes, --tiebreak=end can help in this case, but it looks like the current implementation needs improvement as it doesn't work smoothly with the above example.

For fo, it chooses notes/some/category/foo as expected.

image

But if we add another o to the query,

image

This feels quite wrong, let me see what I can do.

grothesque commented 6 months ago

@junegunn and @dr0bz, thanks for your suggestions.

First, I'd like to comment on the inconsistency noted by @junegunn when using --tiebreak=end.

This is with fzf 0.38.0 from Debian. I'm running the command

echo -e 'projects/foo\nprojects/foo/doc\nprojects/foo/src\nprojects/foo/tests\nnotes/some/category/foo' | fzf 

Without any option, when fo has been typed, fzf suggests projects/foo (Supposedly because --tiebreak=length is the default). I would expect the same behavior with --tiebreak=end,length, but instead it suggests notes/some/category/foo, just like with --tiebreak=end only. Strangely, typing the full foo selects projects/foo independently of the tiebreak setting.


The --tiebreak=end suggestion is a good start, but it's not quite a solution to the real problem that I had in mind. Let me try to demonstrate it with a real-world example:

Let's say I'm searching the filesystem for stuff that relates to "tinyarray". So I can use the excellent fdfind command like this: fdfind tinyarray. Among the many lines it outputs are the following ones:

12/tinyarray-src/
12/tinyarray-src/test_tinyarray.py

From the point of view of searching a hierarchical file system, the second match is redundant. Worse, that directory could contain hundreds of files (that may or may not contain tinyarray in their basename).

That's why fdfind has the --prune option:

       --prune
              Do not traverse into matching directories.

I would find it extremely useful if there was a way to teach fdfind | fzf (with "tinyarray" typed) to give the highest scores to the lines that are output by fdfind --prune tinyarray.

I guess that this would require some special treatment of directory separator characters on the part of fzf. However, I believe that file paths are an important enough application of fzf to justify an exception.

junegunn commented 6 months ago

This is with fzf 0.38.0 from Debian.

This bug we discussed above has been fixed in 0.45.0. You are using a very old version of fzf.

grothesque commented 6 months ago

This bug we discussed above has been fixed in 0.45.0. You are using a very old version of fzf.

Well, OK, but could you please also have a look at the second (longer) part of my comment where I explain what I actually meant when I opened this issue?

junegunn commented 6 months ago

From the point of view of searching a hierarchical file system, the second match is redundant.

I feel quite the opposite. I'm usually not looking for intermediate nodes. Anyway, in that case, the default tiebreak of length should work well because the parent nodes have shorter names. Something like a mixture of length and end? I'm not planning to implement a non-basic scoring mechanism for any particular type of requirement because fzf is just a text filter and I want to leave it that way.

FWIW, you might want to experiment with a patch I posted at https://github.com/junegunn/fzf/issues/3608#issuecomment-1925509323 and see how it works for you.

grothesque commented 6 months ago

Thanks for having a look. The patch you link to looks interesting: perhaps it's possible to implement what I have in mind by assigning a zero (or very low) score to lines whose match does not involve the last path component? (That would rely on the assumption that lines for parent directories are present independently as well.)

I feel quite the opposite. I'm usually not looking for intermediate nodes. (...) I'm not planning to implement a non-basic scoring mechanism for any particular type of requirement because fzf is just a text filter and I want to leave it that way.

Sure, that's a reasonable and consistent design!

I find the "prunning" approach very useful when looking for anything related to a person or a project. If there's a directory "Pictures/fred-birthday" that contains 100 files, and I search for "fred", I don't want other results to be overshadowed by the many individual files in that directory. Length as a criterion doesn't really help: There could be a single line that matches as well, but is very long. Still this way of operation may not be very appropriate for fzf: the "prunned" results are a mixture of files and directories, and fzf's main application in my experience is file selection on the command line.

Please feel free to close this issue.