Open peter50216 opened 2 years ago
I can reproduce that here, but with -j1
the performance is the same. I think this is https://github.com/sharkdp/fd/issues/710, and the cause is just the musl version being upgraded as a result of Rust being updated. Or maybe this is around when Rust stopped using jemalloc by default.
See also
Tested with the gnu version instead of musl, and verified that this is specific to musl.
hyperfine 1.11.0
Benchmark #1: ./fd-v7.2.0-x86_64-unknown-linux-gnu/fd ".*camera_hal.*" ~/chromiumos/src
Time (mean ± σ): 2.439 s ± 0.096 s [User: 99.311 s, System: 109.548 s]
Range (min … max): 2.347 s … 2.679 s 10 runs
Benchmark #2: ./fd-v7.3.0-x86_64-unknown-linux-gnu/fd ".camera_hal." ~/chromiumos/src Time (mean ± σ): 2.947 s ± 0.065 s [User: 138.492 s, System: 49.916 s] Range (min … max): 2.851 s … 3.046 s 10 runs
Summary './fd-v7.2.0-x86_64-unknown-linux-gnu/fd ".camera_hal." ~/chromiumos/src' ran 1.21 ± 0.05 times faster than './fd-v7.3.0-x86_64-unknown-linux-gnu/fd ".camera_hal." ~/chromiumos/src'
There's still a slowdown of ~1.2x, which is probably caused by Rust stopped using jemalloc by default as you said, and jemalloc being faster in this use case than glibc malloc?
I think this is covered by #710 anyway, so feel free to close this as duplicate.
Thank you for reporting this anyway!
See also: https://dev.to/sharkdp/an-unexpected-performance-regression-11ai
Back then, the performance regression was between 7.0 and 7.1, so that doesn't quite fit with your results. You can easily check if a particular fd
executable uses jemalloc by doing something like
strings <fd-executable> | grep jemalloc
Did a quick grep from binaries downloaded from https://github.com/sharkdp/fd/releases:
Using jemalloc:
Not using jemalloc:
Looks like the patch to use jemalloc in 7.4.0 is not applied to musl build (which is also stated in the 7.4.0 release notes).
Also tried building musl + jemalloc on the master branch (c577b0838b2e), with cross build --target=x86_64-unknown-linux-musl
(https://github.com/gnzlbg/jemallocator/issues/124#issuecomment-486561511), and the performance is much better than the non-jemalloc version:
Benchmark #1: ~/temp/fd-musl-no-jemalloc ".*camera_hal.*" ~/chromiumos/src
Time (mean ± σ): 18.901 s ± 0.281 s [User: 166.882 s, System: 1532.500 s]
Range (min … max): 18.467 s … 19.252 s 10 runs
Benchmark #2: ~/temp/fd-musl-jemalloc ".*camera_hal.*" ~/chromiumos/src
Time (mean ± σ): 4.614 s ± 0.570 s [User: 26.295 s, System: 361.069 s]
Range (min … max): 3.435 s … 5.445 s 10 runs
Summary
'~/temp/fd-musl-jemalloc ".*camera_hal.*" ~/chromiumos/src' ran
4.10 ± 0.51 times faster than '~/temp/fd-musl-no-jemalloc ".*camera_hal.*" ~/chromiumos/src'
So it might be worthwhile to enable jemalloc for musl build too. (From a quick glance at the github action the musl version is already building with cross
, so there shouldn't be any build issue)
It's still slower than 7.2.0 but that's likely #599.
Noticed that some fd commends runs much slower (10x slower) when I upgraded my local fd from 6.2.0 to newest 8.3.2, and did a quick version bisect.
Looks like the regression is between 7.2.0 and 7.3.0, and all version I've tested after 7.3.0 (7.4.0, 7.5.0, 8.0.0, 8.1.1, 8.3.2) are all as about the same speed as 7.3.0.
Reproduce script:
(I'm using Chrome OS source tree as an example here, but I can reproduce similar regression on other large source tree, for example, linux source tree)
Result:
Benchmark #2: ./fd-v7.3.0-x86_64-unknown-linux-musl/fd ".camera_hal." ~/chromiumos/src Time (mean ± σ): 25.529 s ± 0.328 s [User: 222.856 s, System: 1924.844 s] Range (min … max): 24.980 s … 26.091 s 10 runs
Summary './fd-v7.2.0-x86_64-unknown-linux-musl/fd ".camera_hal." ~/chromiumos/src' ran 10.34 ± 0.28 times faster than './fd-v7.3.0-x86_64-unknown-linux-musl/fd ".camera_hal." ~/chromiumos/src'
hyperfine 1.13.0 Benchmark 1: ./fd-v7.2.0-x86_64-unknown-linux-musl/fd ".camera_hal." ~/chromiumos/src Time (mean ± σ): 2.348 s ± 0.101 s [User: 10.347 s, System: 6.298 s] Range (min … max): 2.237 s … 2.527 s 10 runs
Benchmark 2: ./fd-v7.3.0-x86_64-unknown-linux-musl/fd ".camera_hal." ~/chromiumos/src Time (mean ± σ): 6.882 s ± 0.090 s [User: 44.010 s, System: 6.813 s] Range (min … max): 6.783 s … 7.065 s 10 runs
Summary './fd-v7.2.0-x86_64-unknown-linux-musl/fd ".camera_hal." ~/chromiumos/src' ran 2.93 ± 0.13 times faster than './fd-v7.3.0-x86_64-unknown-linux-musl/fd ".camera_hal." ~/chromiumos/src'