sharkdp / fd

A simple, fast and user-friendly alternative to 'find'
Apache License 2.0
34.05k stars 817 forks source link

Alternate walk order - tree distance from 'center' directory #734

Open orlp opened 3 years ago

orlp commented 3 years ago

I have a feature request I believe to be quite useful when fd is combined with any sort of fuzzy-finder tool (e.g. fzf), or when returning early results from a very large search.

Suppose we wish to list all files in a directory, e.g. our home:

find . ~

This works, and is relatively quick. However the order of files isn't always very useful. I propose the following:

find . ~ --sort-by-dist ~/programming/rust/fd/src

What this would do is still output all the files in ~, but ordered by the distance from ~/programming/rust/fd/src (which I'll refer to as the 'center') in the 'filesystem tree'. This would be the number of cds you'd need to go from the specified 'center' to the target.

E.g. the distance between ~/programming/rust/fd/src and:

~/programming/rust/fd/src/main.rs = 0
~/programming/rust/fd/src/filter/owner.rs = 1
~/programming/rust/fd/Cargo.toml = 1
~/programming/rust/tmp.rs = 2
~/programming/rust/rg/Cargo.toml = 3

From the brief look I took at the current walker code I don't believe you support walking up into parent directories. Luckily you don't have to, when a center has been specified you can simply recursively pass the distance (so a recursive call into a subdirectory would increase the distance by 1), and start parallel walkers (appropriately ignoring dirs to prevent double work) on:

~/programming/rust/fd/src/ with initial distance 0
~/programming/rust/fd/ with initial distance 1
~/programming/rust/ with initial distance 2
~/programming/ with initial distance 3
~/ with initial distance 4

You don't have to completely wait for all walkers to finish before you can output the final sorted order either. If you add each found file to a queue d for its respective distance d, you can start outputting from a queue d as soon as no walker with current distance d' < d exists anymore. This does assume a (rough) BFS walk order.


Why is this useful? Well often we aren't just interested in 'a file somewhere', but in 'a file somewhere, but probably close to X'. E.g. if you are editing ~/programming/rust/fd/src/main.rs in your editor and use a fuzzy finder plugin with a full listing of files in your home directory, and you type 'owner', you probably want to find ~/programming/rust/fd/src/filter/owner.rs, not ~/.config/X11/owners. Having the files listed in ascending order distance from the center makes this very easy (and allows early results to be found even with massive searches).

tavianator commented 3 years ago

This is a cool idea. I think if you could get the walkers to prioritize candidates with smaller distances, you wouldn't need to run multiple walkers (also note that the walkers in fd are not breadth-first any more). It would be kind of like an A* search.

alanxoc3 commented 3 years ago

I have the same use case. Using fzf, I'm often cd-ing into a directory that is only 2 levels down, but I have to wait for 1-2 seconds because there are many results much deeper that get returned first. I actually rather a flag that specifies the walker (like the --sort-by-dist mentioned), because the functionality wouldn't change on me.

Without the option, results ordering could change again, like how it was breadth-first and became an A* thing.

For my specific usecase, this hack works, but it's not nice by any means:

(fd --exact-depth 1 -t d; fd --exact-depth 2 -t f -t d; fd --exact-depth 3 -t f -t d; fd --exact-depth 4 -t f -t d; fd --min-depth 5 -t f -t d;) | cat
tavianator commented 3 years ago

@alanxoc3 From your example it seems like you just want breadth-first search. I wrote https://github.com/tavianator/bfs for the same reasons you mention it's desirable, if that helps :). It uses the find syntax, not the more human-friendly fd syntax, but for fzf maybe it's okay since you just have to write the command once.

alanxoc3 commented 3 years ago

Hey, that's a nice project you have there, and yep, a breadth first search works for my scenario.

That said, I would expect the idea proposed by @orlp to do the same thing as a breadth first search if you pass in --sort-by-dist ./. Plus I could see it being useful to me in an editing experience as well.

I also just read through #599. It makes sense that there is a performance concern, but if the user specifies that they want a certain walker (BFS, A*, DFS...), there could just be a comment in the man page mentioning there could be a memory concern if using BFS. And it sounds like if --no-ignore is passed, most of the concerns around performance go away, so that could also be mentioned in the documentation.

orlp commented 3 years ago

Yes, my idea is a strict superset of BFS as fd X --sort-by-dist X is equivalent to BFS searching in X. But it's more flexible than that (although if implemented it wouldn't hurt to also create a --bfs shorthand to not have to repeat a path in case you're only interested in BFS).

jeff-hykin commented 12 months ago

While waiting on this feature, here's a poor mans BFS that still uses fd:

bfd() {
    __index=-1 ; while [ "$__index" -lt 256 ]; do
        __index=$((__index+1))
        fd --exact-depth $__index "$@"
    done
    unset __index
}

NOTE: if you have more than 256 nested folders.... first thats very impressive you might want to get that checked, but second this command will also not find stuff deeper than 256 directories down