Feature request: Run `choices_fread` concurrently

adsr commented 4 years ago

It would be nice if fzy became interactive before it finished reading stdin. Useful for slow or large stdin.

bfrg commented 4 years ago

This was already addressed in #41 and #17.

jhawthorn commented 4 years ago

This has come up a few times. There shouldn't be any issue with a large stdin, I benchmark against all files the linux tree (~70000) and the file reading is more or less instantaneous (< 9 ms). Slow input can be a problem, but I've found that fd makes this significantly faster (I used to use ag -g and that's also pretty good). time fd > /dev/null takes 60-100ms in the linux kernel tree on my machine, not instant, but probably fast enough.

Updating my previous thoughts on this:

There's usability issues with running "right away" with whatever input is available. Basically, if you type your input and hit enter too fast, it will return the wrong (or at least an inconsistent) result. I still really don't want to do this, and I'm unconvinced there's even an advantage: the file you're looking for is just as likely to be at the start of input as at the end.

I think fzy does the right thing here, it doesn't display the interface until it can generate correct results (once it has read all the input). If you start typing before fzy starts it still does the right thing, using that as the query once the candidates are loaded.

I could see some value in showing the prompt and some indicator (reading input...) that we're still loading. That would just be a visual improvement, but it might feel better!

zsugabubus commented 4 years ago

This has come up a few times.

First of all really thanks for this finally reasonable algorithm. It’s refreshing after… everything… cough dumb fzf cough.

Users want this feature, that’s why, I guess, it came up for seventh time, so I will fight a little bit to make you a bit acceptable. :) (Happily write a PR too.)

There shouldn't be any issue with a large stdin, I benchmark against all files the linux tree (~70000) and the file reading is more or less instantaneous

Nice. I’m happy you have a fast scenario. Imagine, I can filter through 3 lines even more instantaneously. Not everybody works with 3 files or 70000 or maybe 70000^3. And? As far as I know, fzy works with universal text (stream), read from stdin. That’s why it’s great. It has nothing to do with files… or fast ag or fd or [insert your Rusty program here].

Not everybody has a usecase where input can be generated “immediately”. fzy reads stdin but what’s on the other side? Can be everything.

time fd > /dev/null takes 60-100ms in the linux kernel tree on my machine

Last time when I grepped (rg <-- fast) or something like this in the Linux kernel, it took about 15-30 seconds and all that was inside tmpfs. What I was looked for popped up after a few seconds. ~but i did not hit enter because fzf's spinner was so relaxing~

Basically, if you type your input and hit enter too fast

Then… everybody come closer… do not hit enter!?!? But it’s true for the whole fuzzy finding process, no? You hit enter or anything when you see THE line. If you do not see, you either type more… or wait. Why users should be restricted? If user already sees the line or knows that first line will be that, why the arriving lines should be waited?

the file you're looking for is just as likely to be at the start of input as at the end.

What do you want to say with this? It wants to be a pro or con?

Try looking from the shiny side: What if that “file” is at the beginning? Because as you said… they are equally likely. In the worst case the file is the last: 1/N. Okay. Now users have to wait more in the N-1/N cases. What is it if not speedup? I thought fuzzy finding is about time-saving.

I could see some value in showing the prompt … That would just be a visual improvement, but it might feel better!

Yeah, sure. 😄 At least I will know that already the program knows that it’s waiting for something.

bfrg commented 4 years ago

@jhawthorn I often accidentally run find(1) in my $HOME directory which contains quite a few files. Unfortunately, I realize it when it's too late and quitting with CTRL-C doesn't work until find(1) is done with searching. This is really annoying. It would be great if there was a way to instantly open fzy.

Some other fuzzy searchers like skim start reading from stdin instantly and display the output interactively. Often a file can be fuzzy-found even before find(1) finishes searching.

I am very well aware that searching recursively in $HOME is a corner case and should probably be avoided but it happened too often in the past.

mohkale commented 3 years ago

@jhawthorn I respect your thoughts on this but just out of curiousity what about situations in which you end up streaming a lot of info but you're really only interested in something near the start of it? Say I find all files in a directory with upto 100 subdirectories deep and I'm interested in one two directories down. Waiting for fzy to load all of the files when I'm only interested in one close to the top seems needless wasteful to me.

NOTE: I could specify that -maxdepth 2 but then that limits the command. If I put it into a shell script or bash function for repeated use I have to either parameterise that script/function with a max depth or continually adjust it to deal with different situations.

jhawthorn commented 3 years ago

I respect your thoughts on this but just out of curiousity what about situations in which you end up streaming a lot of info but you're really only interested in something near the start of it? Say I find all files in a directory with upto 100 subdirectories deep and I'm interested in one two directories down. Waiting for fzy to load all of the files when I'm only interested in one close to the top seems needless wasteful to me.

Same problem as always. There's a random chance that when you hit enter you'll get a different result (because it finds a better match in your streaming). I'm more convinced than ever that this shouldn't be done. UI needs to behave predictably and that is more important than any other concerns.

NOTE: I could specify that -maxdepth 2 but then that limits the command. If I put it into a shell script or bash function for repeated use I have to either parameterise that script/function with a max depth or continually adjust it to deal with different situations.

find (at least all implementations I've used) is depth-first. So what you're asking for won't even help you in finding "higher level directories" unless the file you want just happens to be inside one of the "earlier" directories it searches for (which also is unsorted and unpredictable).

If this is a regular issue I recommend building a custom find-like tool which does a breadth first search through directories and stops returning results after a certain number, so that you can get reasonably fast results while removing the unpredictability of streaming solutions.

casr commented 3 years ago

I agree that the experience with fzf streaming in results was janky.

In both instances though, the delay is mainly left of the pipe so if you can improve the speed of results given to the fuzzy finder you'll have a better experience.

One approach might be to use a heuristic such as 'if there's a .git folder then use git ls-files otherwise fallback on a depth-limited search'. In another issue, I pasted a pfind script which does just this.

Also, if it works for your use-case, consider cached results. For instance, locate is blazingly fast and you can just chuck fzy everything from / upwards if you wanted and it's super snappy still.

mohkale commented 3 years ago

The find example was purely an example to demonstrate why I don't like fzy's lack of streaming. I didn't think it through properly :sweat: my bad, imagine I'm describing a breadth first find implementation. In practice I'm actually using something like @casr describes, but I still don't like the noticeable latency from when I run a command to when fzy lets me choose one of its outputs. More than one of the commands I use fzf with has the behaviour I described before, namely an early result is often desired a later result is sometimes required. Waiting for both to be available is too time consuming IMO.

adsr commented 3 years ago

Throwing in my 2 cents. I vastly prefer fzy's algorithm over other fuzzy finders' however I routinely work in a repo with hundreds of thousands of source files which can take seconds to scan. Best workaround I can come up with is pre-scanning the repo via cron or inotify into a file and cat'ing that into fzy. That is much better but still takes 100 millis or so on my setup before becoming interactive. Maybe someone else might find that idea useful.

Sincere apologies for belaboring the point @jhawthorn. One last proposal. To address the race condition between hitting enter and finding a better match, fzy can pin the currently selected item even if it has a lower score, perhaps with some visual indicator. On more input, the item is unpinned. Of course I think most users in this thread would be fine with all of this being non-default behavior.

mohkale commented 3 years ago

One last proposal. To address the race condition between hitting enter and finding a better match, fzy can pin the currently selected item even if it has a lower score, perhaps with some visual indicator. On more input, the item is unpinned. Of course I think most users in this thread would be fine with all of this being non-default behavior.

This was how I thought fzf did it, but evidently it isn't. So how would this work? When there's no query the current entry is always the earliest one we've encountered. Whenever the user inputs a character the best matching entry is pinned. As new entries come in the new entries go above or below the pinned entry but our pin stays on that entry. As before whenever we insert a character and there is an existing query we forget the previously pinned entry and pin the new best entry.

Thinking about the ramifications I understand why there's so much resistance to this. But then again I've never once encountered a race condition with fzf, maybe there's something about it's implementation I'm not noticing.

casr commented 3 years ago

To address the race condition between hitting enter and finding a better match, fzy can pin the currently selected item even if it has a lower score, perhaps with some visual indicator. On more input, the item is unpinned.

I like this proposal. 👍

Another approach might be to make the prompt immediately available for typing but delay results until they are processed in the background. That way brain output lag and fzy input lag run in parallel.

jhawthorn commented 3 years ago

Another approach might be to make the prompt immediately available for typing but delay results until they are processed in the background. That way brain output lag and fzy input lag run in parallel.

This is already fzy's exact behaviour

casr commented 3 years ago

Oops! I must admit I wasn't able to replicate the behaviour even with a large amount of files so I tried to simulate it by using yes | fzy to see what would happen. There's probably something I don't understand about pipe mechanics, or I'm on an old version of fzy, but the input was delayed.

mohkale commented 3 years ago

@casr I tried yes | ./fzy and my computer froze :rofl:. After a while the kernel probably killed the process for taking up too much memory and it fixed itself, but anyone else who tries to replicate should be careful all the same.

This is already fzy's exact behaviour

@jhawthorn I just tried to build from master and do what @casr suggested and that's not the behaviour I'm seeing. fzy is completely unresponsive until all of stdin is read. Even C-c to interrupt/cancel doesn't work.

I'm running on:

zsh: zsh 5.8 (x86_64-pc-linux-gnu)
st: st 0.8.4

jhawthorn commented 3 years ago

I tried yes | ./fzy and my computer froze

What did you expect when feeding infinite input?

Try something like (sleep 5; echo foo) | fzy instead. Typed input is not displayed during that 5 seconds (as per above, I'd be happy to accept a change adding this) but it is accepted and eventually shows up in the prompt.

Even C-c to interrupt/cancel doesn't work.

That's not great. I'd accept a patch to fix that.

pin the currently selected

As mohkale rightly points out "pinning" is not a solution and just introduces more UI race conditions. You're still pinned to the "wrong" option from whenever you typed in. There's also the usability issue of having no way to "update" other than typing more.

Typing the same thing into the same program needs to have the same effect every time.

But then again I've never once encountered a race condition with fzf

Here's a really interesting test:

(echo kindafoo; sleep 5; echo foo) | fzf

If you type "foo", it will return "kindafoo", the wrong answer (it also still takes 5 seconds because of how pipes work, but that probably won't be the case for large inputs).

fzy will never behave this way (possibly this should even be advertised as a feature). It will always correctly return "foo". I'm happy to add more interactivity, but we are always going to be bottlenecked waiting on input (and need to wait for the full input) in order to return correct results.

jhawthorn / fzy

Feature request: Run `choices_fread` concurrently #130