Closed Superty closed 1 year ago
The relevant parts are here:
I think walking the directory tree on multiple threads would be possible to speed this up.
This can be replaced with sort_unstable_by_key
, and sorting could be performed on another thread, if this turns out to be a bottleneck.
While typing, the slowest part is probably fuzzy-searching. This is done here, using fuzzy-matcher:
I'm curious what fzf does differently to be this fast.
I think walking the directory tree on multiple threads would be possible to speed this up.
WalkBuilder does support build_parallel
to do this.
The biggest blocker is that we don't do this loading asynchronously and in a streaming fashion. fzf etc. will continuously update and stream more files as it searches the tree. We want to do that down the line but it's not as straightforward.
I'm actually impressed the current implementation processes your 100k files in less than a second.
I'll look at the improvements suggested and do some measurements with cargo flamegraph
Findings so far, I have some changes staged locally:
Running the walker in parallel is actually detrimental to performance because it adds overhead from threads and atomics used in channels to communicate results.
The biggest blocker is the amount of syscalls: fs::metadata
is called per entry. We had some more calls because we were also reading mtime
and doing an inefficient sort + collect to sort by modification time. I'm getting rid of that because I find sorting by mtime
unreliable anyway.
Initial invocation is still fairly slow (2-4s
for me for /nix
limited to 100k entries), but subsequent calls hit the kernel cache and take about 360ms
(down from 522ms
).
For git repositories where gitignore filter is enabled we can query the git index directly rather than the kernel. This is much faster since it avoids syscalls. We can also in-memory index the current workspace on first instantiation and use notify
to listen for changes to avoid querying the fs.
There were some inefficiencies there too:
Scoring should only start after an idle timer. We already have a system for this for auto-completion, and it can be extended to both the scoring as well as syntax highlighting the file previews.
@archseer AFAIK the git index only contains commited or staged files. New files that haven't been staged yet should also be suggested by helix.
Pushed some of the changes in 78fba8683bdce31ad2d8e710cb70fb9f05b21e8c
The biggest blocker is that we don't do this loading asynchronously and in a streaming fashion
This is so useful. Hopefully one day we'll get there!
Coming from #4076, I want to mention that it might be because of iCloud Drive.
Is there any update on this issue? It is the one thing that is preventing me from switching from vim.
Do you have external file systems mounted in that directory? Those usually slow down the file browser of various reasons. But hx / on a Linux system with lots of file is usually pretty snappy.
I tried hx inside my iCloud directory, it took about 3 seconds to open up a 4Gb iCloud total directory. While hx in my $HOME dir (M1 MacBookPro) takes forever thanks to OneDrive and iCloud mount points.
I can confirm it's still laggy on the Rust project on Linux.
No I don't have any external file systems, the only file system is the default one on 1 NVME storage. And the slow down happens on hello world rust project, which is weird as there is only 16 files in directory
edit: I tried it on c++ project (with 23 files) too and it freezes
I encounter a lot of blocking on a large monorepo mounted in a virtual filesystem. It would be nice if the blocking lookup happened in a background thread so that I could at least keep typing.
Tried to run file picker on a big rust project locally with disabled gitignore, in total around ~200k files, typing each symbol in picker takes ~400-700ms to process. fzf searches smoothly but takes much more to load (apparently because it uses shell commands, wtf?). Perfed and got the following numbers:
48.14% [.] fuzzy_matcher::skim::SkimMatcherV2::fuzzy
12.95% [.] fuzzy_matcher::skim::CharType::of
7.06% [.] fuzzy_matcher::util::cheap_matches
6.03% [.] <std::path::Components as core::iter::traits::iterator::Iterator>::next
3.07% [.] <core::str::lossy::Utf8Chunks as core::iter::traits::iterator::Iterator>::next
2.95% [.] std::path::compare_components
1.52% [.] std::path::Path::_strip_prefix
1.07% [.] std::path::Components::parse_next_component_back
1.04% [.] core::slice::sort::recurse
0.95% [.] core::slice::memchr::memchr_aligned
...
Helix built with release + debug=true
at current master.
So I guess optimizing out unmaintained fuzzy_matcher is a best option here. fzf algo isn't that trivial but fits into 1kloc if anyone interested in its re-implementation: source.
UPD: profile with call-graph is a little bit different but fuzzy_matcher still among the top entries, see attached profile (to see use https://profiler.firefox.com/ for example). perf.script.gz
The problem isn't the matching but the difference that fzf will match asynchronously. fuzzy_matcher
isn't necessarily unmaintained, there just isn't a lot of work to do on an algorithm after it's implemented.
Yeah akim can beat fzf in performance. Fuzzy matchers I very well optimized. But skim and fzf:
I believe it should be no difference in doing that sync vs async. The only concern might arise due to big i/o latencies (which is not the case I've reported). In my case fzf doesn't perform any visible reorder (which imply it handles input more quickly), nor using threads (checked in top + got the same result with cpulimit). Moreover async impose overhead: sending data around the tasks, scheduling and lightweight (yet existing) context-switches. Therefore I think it should be possible achieving the similar speed without major code re-organization (i.e bringing the asynchronicity).
The main reason fzf is smooth because it score the entries asynchronously and streams the results in. In almost all cases your current top results will be your new top results so you dont notice but this very much happens and is critical for good performance.. Fzf is also always using multiple threads without an option to turn them off. Even if the matching algorithm is slower the difference will not be noticable with practical applications. skim
is fast enough for all practical uses cases and uses the same algorithm as helix.
The fzf or fzy algorithm might be better but including a fuzzy matching algorithm in helix is out of scope and won't fix the performance issues. If a well maintained fuzzy matching crate comes aekuns that contains detailed bwncjamrks that demonstrate better performance than fuzzy-matchers
then we could migrate but this is not something I would want to out effort into
Reproduction steps
On large projects like LLVM, the file picker is quite slow to open and also is not nice to type in due to latency with each keystroke. On the other hand
fzf
handles the same project very smoothly. Here is an asciinema. I first show fzf and then helix. In the helix part I press space and then f immediately. Not sure how visible the typing latency is though in this recording, but you can see that it takes a while for the filepicker to come up. On the other handfd | fzf
opens immediately. (Sorry for it being cutoff on the right. You can still see that the menu opens and then it takes a while for the filepicker to open.)Environment
(helix.log doesn't have any lines from today)