sharkdp / fd

A simple, fast and user-friendly alternative to 'find'
Apache License 2.0
34.29k stars 816 forks source link

[BUG] 🐌 fd can be much slower than GNU find in some cases #1614

Open vegerot opened 2 months ago

vegerot commented 2 months ago

Checks

Describe the bug you encountered:

(I'm not sure if we count performance issues as a bug or not)

See https://github.com/sharkdp/fd-benchmarks/pull/5 for repro

I am building the world's fastest application launcher for GNU+Linux. I thought fd would be a good choice for finding applications, but after benchmarking I found that GNU find can be 9 times faster than fd. Am I using fd wrong, or in this case is GNU find faster?

Describe what you expected to happen:

Reproduction steps:

Run ./warm-cache-exe-paths.sh in https://github.com/sharkdp/fd-benchmarks/pull/5

1. download https://github.com/vegerot/dotfiles/blob/a9a50230d808572173d3eeec057739d0fe8d4470/bin/fzf-menu

2. Run

$ hyperfine -w2 -n "BSD find" "FIND_PROG=find FIND_ARGS='-maxdepth 1 -perm +111 -type f,l -exec basename {} ;' fzf-menu --list-programs" \
        -n "GNU find" "FIND_PROG=gfind FIND_ARGS='-maxdepth 1 -executable -printf %f\n' fzf-menu --list-programs" \
        -n "fd" "FIND_PROG='fd .' FIND_ARGS='--max-depth=1 --type=executable --exec basename {} ;' fzf-menu --list-programs"

(note: this is on macOS. I reprod on GNU+Linux as well but didn't measure BSD find)

Expected: fd to be the fastest.

Actual:

$ hyperfine -w2 -n "BSD find" "FIND_PROG=find FIND_ARGS='-maxdepth 1 -perm +111 -type f,l -exec basename {} ;' fzf-menu --list-programs" \
        -n "GNU find" "FIND_PROG=gfind FIND_ARGS='-maxdepth 1 -executable -printf %f\n' fzf-menu --list-programs" \
        -n "fd" "FIND_PROG='fd .' FIND_ARGS='--max-depth=1 --type=executable --exec basename {} ;' fzf-menu --list-programs"
Benchmark 1: BSD find
  Time (mean ± σ):      4.865 s ±  0.191 s    [User: 0.785 s, System: 2.254 s]
  Range (min … max):    4.623 s …  5.089 s    10 runs

Benchmark 2: GNU find
  Time (mean ± σ):     133.9 ms ±   9.3 ms    [User: 30.9 ms, System: 66.0 ms]
  Range (min … max):   126.0 ms … 165.7 ms    21 runs

Benchmark 3: fd
  Time (mean ± σ):      1.198 s ±  0.055 s    [User: 1.008 s, System: 4.774 s]
  Range (min … max):    1.125 s …  1.311 s    10 runs

Summary
  GNU find ran
    8.95 ± 0.74 times faster than fd
   36.35 ± 2.89 times faster than BSD find

What version of fd are you using?

fd 10.2.0

Which operating system / distribution are you on?

Reprod on:
- Darwin 23.6.0 arm64
- Debian 12 Linux 6.1.0-23-amd64 x86_64

Is there a way I can use fd that will make it faster than GNU find, or for my particular application is GNU find just better?

Please see https://github.com/sharkdp/fd-benchmarks/pull/5 for repro

tavianator commented 2 months ago

It's not too surprising that fd ... --exec basename is much slower than gfind ... -printf '%f\n', since gfind doesn't have to spawn a new process for every file. Luckily, since #1043, you can do

$ fd . --max-depth=1 --type=executable --format '{/}'
vegerot commented 2 months ago

@tavianator thanks! It's still slower, but the gap is smaller

$ hyperfine -w2 -n "BSD find" "FIND_PROG=find FIND_ARGS='-maxdepth 1 -perm +111 -type f,l -exec basename {} ;' fzf-menu --list-programs" \
        -n "GNU find" "FIND_PROG=gfind FIND_ARGS='-maxdepth 1 -executable -printf %f\n' fzf-menu --list-programs" \
        -n fd "FIND_PROG='fd .' FIND_ARGS='--max-depth=1 --type=executable --format \"{\}\" ' fzf-menu --list-programs"
Benchmark 1: BSD find
  Time (mean ± σ):      5.074 s ±  0.077 s    [User: 0.806 s, System: 2.332 s]
  Range (min … max):    4.988 s …  5.203 s    10 runs

Benchmark 2: GNU find
  Time (mean ± σ):     138.9 ms ±   4.6 ms    [User: 32.0 ms, System: 69.9 ms]
  Range (min … max):   130.2 ms … 151.2 ms    21 runs

Benchmark 3: fd
  Time (mean ± σ):     249.2 ms ±   8.4 ms    [User: 77.3 ms, System: 108.8 ms]
  Range (min … max):   235.4 ms … 262.9 ms    11 runs

Summary
  GNU find ran
    1.79 ± 0.08 times faster than fd
   36.52 ± 1.32 times faster than BSD find
tavianator commented 2 months ago

I see \"{\}\" where you probably wanted \"{/}\", not sure that makes a difference though. Also you probably want -type f,l -executable for GNU find, otherwise you'll get directories too.

vegerot commented 2 months ago

Thanks! I had -type f,l before but accidentally removed it. Same perf

$ hyperfine -w2 -n "BSD find" "FIND_PROG=find FIND_ARGS='-maxdepth 1 -perm +111 -type f,l -exec basename {} ;' fzf-menu --list-programs" \
        -n "GNU find" "FIND_PROG=gfind FIND_ARGS='-maxdepth 1 -executable -type f,l -printf %f\n' fzf-menu --list-programs" \
        -n fd "FIND_PROG='fd .' FIND_ARGS='--max-depth=1 --type=executable --format \"{/}\" ' fzf-menu --list-programs"
Benchmark 1: BSD find
  Time (mean ± σ):      5.236 s ±  0.251 s    [User: 0.848 s, System: 2.431 s]
  Range (min … max):    4.899 s …  5.689 s    10 runs

Benchmark 2: GNU find
  Time (mean ± σ):     156.6 ms ±   5.2 ms    [User: 35.3 ms, System: 79.0 ms]
  Range (min … max):   147.7 ms … 171.2 ms    18 runs

Benchmark 3: fd
  Time (mean ± σ):     259.8 ms ±   7.9 ms    [User: 82.7 ms, System: 111.5 ms]
  Range (min … max):   248.4 ms … 276.1 ms    11 runs

Summary
  GNU find ran
    1.66 ± 0.07 times faster than fd
   33.43 ± 1.95 times faster than BSD find
vegerot commented 2 months ago

@tavianator are you able to reproduce https://github.com/sharkdp/fd-benchmarks/pull/5 ?

tavianator commented 2 months ago

My results look like this:

$ hyperfine -w2 \
    'fd . $(tr ":" " " <<<"$PATH") --follow --max-depth=1 --type=executable --format "{/}"' \
    {find,bfs}' -L $(tr ":" " " <<<"$PATH") -maxdepth 1 -type f -executable -printf "%f\n"'

Benchmark 1: fd . $(tr ":" " " <<<"$PATH") --follow --max-depth=1 --type=executable --format "{/}"
  Time (mean ± σ):      18.3 ms ±   1.1 ms    [User: 29.1 ms, System: 39.0 ms]
  Range (min … max):    16.1 ms …  20.9 ms    163 runs

Benchmark 2: find -L $(tr ":" " " <<<"$PATH") -maxdepth 1 -type f -executable -printf "%f\n"
  Time (mean ± σ):      20.3 ms ±   1.9 ms    [User: 3.6 ms, System: 16.7 ms]
  Range (min … max):    18.9 ms …  36.0 ms    138 runs

Benchmark 3: bfs -L $(tr ":" " " <<<"$PATH") -maxdepth 1 -type f -executable -printf "%f\n"
  Time (mean ± σ):      24.7 ms ±   0.4 ms    [User: 10.1 ms, System: 89.5 ms]
  Range (min … max):    23.7 ms …  25.9 ms    116 runs

Summary
  fd . $(tr ":" " " <<<"$PATH") --follow --max-depth=1 --type=executable --format "{/}" ran
    1.11 ± 0.12 times faster than find -L $(tr ":" " " <<<"$PATH") -maxdepth 1 -type f -executable -printf "%f\n"
    1.35 ± 0.08 times faster than bfs -L $(tr ":" " " <<<"$PATH") -maxdepth 1 -type f -executable -printf "%f\n"

What does a similar benchmark look like for you? It's possible that fzf-menu/get_programs_in_path is responsible for some of the performance difference.

sharkdp commented 2 months ago

I had a short look at this and could reproduce your results (I guess it depends a lot on the number of directories in PATH). I'm currently away from a machine to test this but the way you use fd here is probably the worst case. You're basically just listing (executable) files in a single directorywithout traversing anything. These two facts together essentially mean that we can't make use of parallelism. In this case, it's no surprise that fd is slower. This is a case we haven't optimized for. You could try with --threads=1. Maybe that makes fd a bit faster, but I wouldn't be surprised if it's still slower than find.

But what I would rather do (if you want to optimize the whole thing) is to change that shell function. It shouldn't spawn the find program N times, once for each directory in PATH. Instead, it should pass all N directories to a single instance of the find program.

Edit: I saw only now that this is exactly what @tavianator did in his benchmark.

tavianator commented 2 months ago

It shouldn't spawn the find program N times, once for each directory in PATH. Instead, it should pass all N directories to a single instance of the find program.

Indeed, I just tried @vegerot's get_programs_in_path() shell function and did find that fd was much slower. The reason is that fd starts up much slower than find:

tavianator@tachyon $ hyperfine -N -w2 "fd -1" "find -quit"
Benchmark 1: fd -1
  Time (mean ± σ):       5.4 ms ±   0.4 ms    [User: 2.0 ms, System: 8.9 ms]
  Range (min … max):     4.8 ms …   7.3 ms    457 runs

Benchmark 2: find -quit
  Time (mean ± σ):     636.9 µs ±  69.3 µs    [User: 484.7 µs, System: 78.8 µs]
  Range (min … max):   563.1 µs … 1473.5 µs    3548 runs

Summary
  find -quit ran
    8.47 ± 1.11 times faster than fd -1

5 ms doesn't usually matter but when you run fd once for each directory rather than passing them all at once, you get 5 ms × entries in $PATH. For me that's 70 ms.

If I fix get_programs_in_path() like this:

  local paths=$(tr ':' '\n' <<<"$PATH" | awk '!x[$0]++')
  $find_prog $paths $find_args 2>/dev/null \
    | awk '!x[$0]++'

I get these results:

Benchmark 1: FIND_PROG='find -L' FIND_ARGS='-maxdepth 1 -executable -type f,l -printf %f\n' get_programs_in_path
  Time (mean ± σ):      22.4 ms ±   0.8 ms    [User: 8.1 ms, System: 19.9 ms]
  Range (min … max):    21.3 ms …  26.8 ms    125 runs

Benchmark 2: FIND_PROG='fd .' FIND_ARGS='--hidden --max-depth=1 --type=executable --follow --format {/}' get_programs_in_path
  Time (mean ± σ):      20.5 ms ±   1.1 ms    [User: 34.3 ms, System: 43.7 ms]
  Range (min … max):    18.5 ms …  23.3 ms    144 runs

Summary
  FIND_PROG='fd .' FIND_ARGS='--hidden --max-depth=1 --type=executable --follow --format {/}' get_programs_in_path ran
    1.09 ± 0.07 times faster than FIND_PROG='find -L' FIND_ARGS='-maxdepth 1 -executable -type f,l -printf %f\n' get_programs_in_path

See #1203, #1362, #1408, #1412, #1414, #1422, #1431 for some previous discussion and optimization of startup time.

tavianator commented 2 months ago

A quick analysis with perf record -e intel_pt//u shows that most of the startup overhead is from the worker threads. Passing -j1 drops the time to ~2 ms. Much of the remaining overhead is from Clap. I don't think it's worth trying to optimize this much more, unless there's a quick speedup I'm not seeing.