File picker has high latency on large projects

Superty commented 2 years ago

Reproduction steps

On large projects like LLVM, the file picker is quite slow to open and also is not nice to type in due to latency with each keystroke. On the other hand fzf handles the same project very smoothly. Here is an asciinema. I first show fzf and then helix. In the helix part I press space and then f immediately. Not sure how visible the typing latency is though in this recording, but you can see that it takes a while for the filepicker to come up. On the other hand fd | fzf opens immediately. (Sorry for it being cutoff on the right. You can still see that the menu opens and then it takes a while for the filepicker to open.)

Environment

Platform: Linux
Terminal emulator: alacritty
Helix version: v0.5.0-785-g806cc1c3

(helix.log doesn't have any lines from today)

Aloso commented 2 years ago

The relevant parts are here:

https://github.com/helix-editor/helix/blob/6a6a9ab2b3d8bbd8d97e9ac62bb0ef1fe620062e/helix-term/src/ui/mod.rs#L132-L149

I think walking the directory tree on multiple threads would be possible to speed this up.

https://github.com/helix-editor/helix/blob/6a6a9ab2b3d8bbd8d97e9ac62bb0ef1fe620062e/helix-term/src/ui/mod.rs#L158

This can be replaced with sort_unstable_by_key, and sorting could be performed on another thread, if this turns out to be a bottleneck.

While typing, the slowest part is probably fuzzy-searching. This is done here, using fuzzy-matcher:

https://github.com/helix-editor/helix/blob/6a6a9ab2b3d8bbd8d97e9ac62bb0ef1fe620062e/helix-term/src/ui/picker.rs#L347-L349

I'm curious what fzf does differently to be this fast.

archseer commented 2 years ago

I think walking the directory tree on multiple threads would be possible to speed this up.

WalkBuilder does support build_parallel to do this.

The biggest blocker is that we don't do this loading asynchronously and in a streaming fashion. fzf etc. will continuously update and stream more files as it searches the tree. We want to do that down the line but it's not as straightforward.

I'm actually impressed the current implementation processes your 100k files in less than a second.

I'll look at the improvements suggested and do some measurements with cargo flamegraph

archseer commented 2 years ago

Findings so far, I have some changes staged locally:

Initialization

Running the walker in parallel is actually detrimental to performance because it adds overhead from threads and atomics used in channels to communicate results.

The biggest blocker is the amount of syscalls: fs::metadata is called per entry. We had some more calls because we were also reading mtime and doing an inefficient sort + collect to sort by modification time. I'm getting rid of that because I find sorting by mtime unreliable anyway.

Initial invocation is still fairly slow (2-4s for me for /nix limited to 100k entries), but subsequent calls hit the kernel cache and take about 360ms (down from 522ms).

For git repositories where gitignore filter is enabled we can query the git index directly rather than the kernel. This is much faster since it avoids syscalls. We can also in-memory index the current workspace on first instantiation and use notify to listen for changes to avoid querying the fs.

Match scoring

There were some inefficiencies there too:

Added a fastpath for empty search pattern. It was reasonably fast before too though.
Scoring only occurs if the search pattern changes, we would re-score while moving the cursor before
If the new pattern starts with the old pattern, we can avoid re-scoring the whole set and instead only re-score the current matches.

Scoring should only start after an idle timer. We already have a system for this for auto-completion, and it can be extended to both the scoring as well as syntax highlighting the file previews.

Aloso commented 2 years ago

@archseer AFAIK the git index only contains commited or staged files. New files that haven't been staged yet should also be suggested by helix.

archseer commented 2 years ago

Pushed some of the changes in 78fba8683bdce31ad2d8e710cb70fb9f05b21e8c

pppKin commented 2 years ago

The biggest blocker is that we don't do this loading asynchronously and in a streaming fashion

This is so useful. Hopefully one day we'll get there!

simonsolnes commented 2 years ago

Coming from #4076, I want to mention that it might be because of iCloud Drive.

timrekelj commented 1 year ago

Is there any update on this issue? It is the one thing that is preventing me from switching from vim.

ksandvik commented 1 year ago

Do you have external file systems mounted in that directory? Those usually slow down the file browser of various reasons. But hx / on a Linux system with lots of file is usually pretty snappy.

I tried hx inside my iCloud directory, it took about 3 seconds to open up a 4Gb iCloud total directory. While hx in my $HOME dir (M1 MacBookPro) takes forever thanks to OneDrive and iCloud mount points.

antoyo commented 1 year ago

I can confirm it's still laggy on the Rust project on Linux.

timrekelj commented 1 year ago

No I don't have any external file systems, the only file system is the default one on 1 NVME storage. And the slow down happens on hello world rust project, which is weird as there is only 16 files in directory

edit: I tried it on c++ project (with 23 files) too and it freezes

jonahd-g commented 1 year ago

I encounter a lot of blocking on a large monorepo mounted in a virtual filesystem. It would be nice if the blocking lookup happened in a background thread so that I could at least keep typing.

l4l commented 1 year ago

Tried to run file picker on a big rust project locally with disabled gitignore, in total around ~200k files, typing each symbol in picker takes ~400-700ms to process. fzf searches smoothly but takes much more to load (apparently because it uses shell commands, wtf?). Perfed and got the following numbers:

  48.14%    [.] fuzzy_matcher::skim::SkimMatcherV2::fuzzy
  12.95%    [.] fuzzy_matcher::skim::CharType::of
   7.06%    [.] fuzzy_matcher::util::cheap_matches
   6.03%    [.] <std::path::Components as core::iter::traits::iterator::Iterator>::next
   3.07%    [.] <core::str::lossy::Utf8Chunks as core::iter::traits::iterator::Iterator>::next
   2.95%    [.] std::path::compare_components
   1.52%    [.] std::path::Path::_strip_prefix
   1.07%    [.] std::path::Components::parse_next_component_back
   1.04%    [.] core::slice::sort::recurse
   0.95%    [.] core::slice::memchr::memchr_aligned
...

Helix built with release + debug=true at current master.

So I guess optimizing out unmaintained fuzzy_matcher is a best option here. fzf algo isn't that trivial but fits into 1kloc if anyone interested in its re-implementation: source.

UPD: profile with call-graph is a little bit different but fuzzy_matcher still among the top entries, see attached profile (to see use https://profiler.firefox.com/ for example). perf.script.gz

archseer commented 1 year ago

The problem isn't the matching but the difference that fzf will match asynchronously. fuzzy_matcher isn't necessarily unmaintained, there just isn't a lot of work to do on an algorithm after it's implemented.

pascalkuthe commented 1 year ago

Yeah akim can beat fzf in performance. Fuzzy matchers I very well optimized. But skim and fzf:

Do the matching parallel
Stream entirws in asymchrounsly (which also avoids blocking the ui thread on slow IO)

l4l commented 1 year ago

I believe it should be no difference in doing that sync vs async. The only concern might arise due to big i/o latencies (which is not the case I've reported). In my case fzf doesn't perform any visible reorder (which imply it handles input more quickly), nor using threads (checked in top + got the same result with cpulimit). Moreover async impose overhead: sending data around the tasks, scheduling and lightweight (yet existing) context-switches. Therefore I think it should be possible achieving the similar speed without major code re-organization (i.e bringing the asynchronicity).

pascalkuthe commented 1 year ago

The main reason fzf is smooth because it score the entries asynchronously and streams the results in. In almost all cases your current top results will be your new top results so you dont notice but this very much happens and is critical for good performance.. Fzf is also always using multiple threads without an option to turn them off. Even if the matching algorithm is slower the difference will not be noticable with practical applications. skim is fast enough for all practical uses cases and uses the same algorithm as helix.

The fzf or fzy algorithm might be better but including a fuzzy matching algorithm in helix is out of scope and won't fix the performance issues. If a well maintained fuzzy matching crate comes aekuns that contains detailed bwncjamrks that demonstrate better performance than fuzzy-matchers then we could migrate but this is not something I would want to out effort into

helix-editor / helix