Open vegerot opened 2 months ago
It's not too surprising that fd ... --exec basename
is much slower than gfind ... -printf '%f\n'
, since gfind
doesn't have to spawn a new process for every file. Luckily, since #1043, you can do
$ fd . --max-depth=1 --type=executable --format '{/}'
@tavianator thanks! It's still slower, but the gap is smaller
$ hyperfine -w2 -n "BSD find" "FIND_PROG=find FIND_ARGS='-maxdepth 1 -perm +111 -type f,l -exec basename {} ;' fzf-menu --list-programs" \
-n "GNU find" "FIND_PROG=gfind FIND_ARGS='-maxdepth 1 -executable -printf %f\n' fzf-menu --list-programs" \
-n fd "FIND_PROG='fd .' FIND_ARGS='--max-depth=1 --type=executable --format \"{\}\" ' fzf-menu --list-programs"
Benchmark 1: BSD find
Time (mean ± σ): 5.074 s ± 0.077 s [User: 0.806 s, System: 2.332 s]
Range (min … max): 4.988 s … 5.203 s 10 runs
Benchmark 2: GNU find
Time (mean ± σ): 138.9 ms ± 4.6 ms [User: 32.0 ms, System: 69.9 ms]
Range (min … max): 130.2 ms … 151.2 ms 21 runs
Benchmark 3: fd
Time (mean ± σ): 249.2 ms ± 8.4 ms [User: 77.3 ms, System: 108.8 ms]
Range (min … max): 235.4 ms … 262.9 ms 11 runs
Summary
GNU find ran
1.79 ± 0.08 times faster than fd
36.52 ± 1.32 times faster than BSD find
I see \"{\}\"
where you probably wanted \"{/}\"
, not sure that makes a difference though. Also you probably want -type f,l -executable
for GNU find, otherwise you'll get directories too.
Thanks! I had -type f,l
before but accidentally removed it. Same perf
$ hyperfine -w2 -n "BSD find" "FIND_PROG=find FIND_ARGS='-maxdepth 1 -perm +111 -type f,l -exec basename {} ;' fzf-menu --list-programs" \
-n "GNU find" "FIND_PROG=gfind FIND_ARGS='-maxdepth 1 -executable -type f,l -printf %f\n' fzf-menu --list-programs" \
-n fd "FIND_PROG='fd .' FIND_ARGS='--max-depth=1 --type=executable --format \"{/}\" ' fzf-menu --list-programs"
Benchmark 1: BSD find
Time (mean ± σ): 5.236 s ± 0.251 s [User: 0.848 s, System: 2.431 s]
Range (min … max): 4.899 s … 5.689 s 10 runs
Benchmark 2: GNU find
Time (mean ± σ): 156.6 ms ± 5.2 ms [User: 35.3 ms, System: 79.0 ms]
Range (min … max): 147.7 ms … 171.2 ms 18 runs
Benchmark 3: fd
Time (mean ± σ): 259.8 ms ± 7.9 ms [User: 82.7 ms, System: 111.5 ms]
Range (min … max): 248.4 ms … 276.1 ms 11 runs
Summary
GNU find ran
1.66 ± 0.07 times faster than fd
33.43 ± 1.95 times faster than BSD find
@tavianator are you able to reproduce https://github.com/sharkdp/fd-benchmarks/pull/5 ?
My results look like this:
$ hyperfine -w2 \
'fd . $(tr ":" " " <<<"$PATH") --follow --max-depth=1 --type=executable --format "{/}"' \
{find,bfs}' -L $(tr ":" " " <<<"$PATH") -maxdepth 1 -type f -executable -printf "%f\n"'
Benchmark 1: fd . $(tr ":" " " <<<"$PATH") --follow --max-depth=1 --type=executable --format "{/}"
Time (mean ± σ): 18.3 ms ± 1.1 ms [User: 29.1 ms, System: 39.0 ms]
Range (min … max): 16.1 ms … 20.9 ms 163 runs
Benchmark 2: find -L $(tr ":" " " <<<"$PATH") -maxdepth 1 -type f -executable -printf "%f\n"
Time (mean ± σ): 20.3 ms ± 1.9 ms [User: 3.6 ms, System: 16.7 ms]
Range (min … max): 18.9 ms … 36.0 ms 138 runs
Benchmark 3: bfs -L $(tr ":" " " <<<"$PATH") -maxdepth 1 -type f -executable -printf "%f\n"
Time (mean ± σ): 24.7 ms ± 0.4 ms [User: 10.1 ms, System: 89.5 ms]
Range (min … max): 23.7 ms … 25.9 ms 116 runs
Summary
fd . $(tr ":" " " <<<"$PATH") --follow --max-depth=1 --type=executable --format "{/}" ran
1.11 ± 0.12 times faster than find -L $(tr ":" " " <<<"$PATH") -maxdepth 1 -type f -executable -printf "%f\n"
1.35 ± 0.08 times faster than bfs -L $(tr ":" " " <<<"$PATH") -maxdepth 1 -type f -executable -printf "%f\n"
What does a similar benchmark look like for you? It's possible that fzf-menu
/get_programs_in_path
is responsible for some of the performance difference.
I had a short look at this and could reproduce your results (I guess it depends a lot on the number of directories in PATH). I'm currently away from a machine to test this but the way you use fd here is probably the worst case. You're basically just listing (executable) files in a single directorywithout traversing anything. These two facts together essentially mean that we can't make use of parallelism. In this case, it's no surprise that fd is slower. This is a case we haven't optimized for. You could try with --threads=1. Maybe that makes fd a bit faster, but I wouldn't be surprised if it's still slower than find.
But what I would rather do (if you want to optimize the whole thing) is to change that shell function. It shouldn't spawn the find program N times, once for each directory in PATH. Instead, it should pass all N directories to a single instance of the find program.
Edit: I saw only now that this is exactly what @tavianator did in his benchmark.
It shouldn't spawn the find program N times, once for each directory in PATH. Instead, it should pass all N directories to a single instance of the find program.
Indeed, I just tried @vegerot's get_programs_in_path()
shell function and did find that fd
was much slower. The reason is that fd
starts up much slower than find:
tavianator@tachyon $ hyperfine -N -w2 "fd -1" "find -quit"
Benchmark 1: fd -1
Time (mean ± σ): 5.4 ms ± 0.4 ms [User: 2.0 ms, System: 8.9 ms]
Range (min … max): 4.8 ms … 7.3 ms 457 runs
Benchmark 2: find -quit
Time (mean ± σ): 636.9 µs ± 69.3 µs [User: 484.7 µs, System: 78.8 µs]
Range (min … max): 563.1 µs … 1473.5 µs 3548 runs
Summary
find -quit ran
8.47 ± 1.11 times faster than fd -1
5 ms doesn't usually matter but when you run fd
once for each directory rather than passing them all at once, you get 5 ms × entries in $PATH. For me that's 70 ms.
If I fix get_programs_in_path()
like this:
local paths=$(tr ':' '\n' <<<"$PATH" | awk '!x[$0]++')
$find_prog $paths $find_args 2>/dev/null \
| awk '!x[$0]++'
I get these results:
Benchmark 1: FIND_PROG='find -L' FIND_ARGS='-maxdepth 1 -executable -type f,l -printf %f\n' get_programs_in_path
Time (mean ± σ): 22.4 ms ± 0.8 ms [User: 8.1 ms, System: 19.9 ms]
Range (min … max): 21.3 ms … 26.8 ms 125 runs
Benchmark 2: FIND_PROG='fd .' FIND_ARGS='--hidden --max-depth=1 --type=executable --follow --format {/}' get_programs_in_path
Time (mean ± σ): 20.5 ms ± 1.1 ms [User: 34.3 ms, System: 43.7 ms]
Range (min … max): 18.5 ms … 23.3 ms 144 runs
Summary
FIND_PROG='fd .' FIND_ARGS='--hidden --max-depth=1 --type=executable --follow --format {/}' get_programs_in_path ran
1.09 ± 0.07 times faster than FIND_PROG='find -L' FIND_ARGS='-maxdepth 1 -executable -type f,l -printf %f\n' get_programs_in_path
See #1203, #1362, #1408, #1412, #1414, #1422, #1431 for some previous discussion and optimization of startup time.
A quick analysis with perf record -e intel_pt//u
shows that most of the startup overhead is from the worker threads. Passing -j1
drops the time to ~2 ms. Much of the remaining overhead is from Clap. I don't think it's worth trying to optimize this much more, unless there's a quick speedup I'm not seeing.
Checks
Describe the bug you encountered:
(I'm not sure if we count performance issues as a bug or not)
See https://github.com/sharkdp/fd-benchmarks/pull/5 for repro
I am building the world's fastest application launcher for GNU+Linux. I thought
fd
would be a good choice for finding applications, but after benchmarking I found that GNU find can be 9 times faster than fd. Am I usingfd
wrong, or in this case is GNU find faster?Describe what you expected to happen:
Reproduction steps:
Run
./warm-cache-exe-paths.sh
in https://github.com/sharkdp/fd-benchmarks/pull/51. download https://github.com/vegerot/dotfiles/blob/a9a50230d808572173d3eeec057739d0fe8d4470/bin/fzf-menu2. Run(note: this is on macOS. I reprod on GNU+Linux as well but didn't measure BSD find)
Expected:
fd
to be the fastest.Actual:
What version of
fd
are you using?fd 10.2.0
Which operating system / distribution are you on?
Is there a way I can use
fd
that will make it faster than GNU find, or for my particular application is GNU find just better?Please see https://github.com/sharkdp/fd-benchmarks/pull/5 for repro