Open mustafa0x opened 3 years ago
Thanks for the report.
I'm not familiar with the implementation of fzy, so I can't tell where the performance difference stems from, but here are a few observations.
Looks like most of the time is spent loading up the list. cat arwiki-latest-abstract1.xml | fzf
and you'll see that it takes a couple of seconds until the loading is complete. On the other hand, fzy finishes loading much faster. I think the major difference comes from the fact that fzf was inherently designed to support streaming input, while fzy blocks until the input command completes.
for i in $(seq 100); do echo $i; sleep 0.1; done | fzf
for i in $(seq 100); do echo $i; sleep 0.1; done | fzy
find / | fzf
find / | fzy
There is much more work involved to support streaming, so I can imagine fzf is significantly slower than fzy in this scenario.
The search algorithm of fzf is particularly optimized for ASCII data.
Maybe we can apply the same optimization to non-ASCII data when --literal
is set, but I haven't tried.
Thank you so much for the reply!
I'm sure streaming is playing a role, but it can't be the major cause, since the difference between fzy and fzf on https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-abstract10.xml.gz
is much less drastic.
$> time cat enwiki-latest-abstract10.xml | fzf +x +i --algo=v1 --literal -f 'All_article_disambiguation_pages' | wc -l
7221
real 0m0.637s
user 0m1.036s
sys 0m0.235s
$> time cat enwiki-latest-abstract10.xml | fzy -e 'All_article_disambiguation_pages' | wc -l
7222
real 0m0.335s
user 0m0.911s
sys 0m0.238s
Note: +x +i --algo=v1 --literal
didn't make a substantial difference with English data, while with Arabic data reduced the time taken several times over.
Nice observation. So while streaming overhead definitely exists, it isn't the major factor in loading performance.
If a line only contains ASCII characters, fzf can load it as a byte array without any processing. But if it contains a non-ASCII character, fzf loads it as an array of 4-byte integers, unpacking Unicode code points from the original byte array. So there is significant memory and computational overhead.
Maybe we can somehow avoid the extra processing as much as possible with some tricks. I'll see if there's any room for improvement when I get some time. (FWIW, I mostly use fzf with ASCII input, so this has never been an issue for me.)
Awesome, thank you! Yeah, it's a bit of an untypical use case. However, all the unique features of fzf make it a powerful multi purpose search tool, so this use case might start becoming a bit more common.
Has there been any more insights into this? It sounds interesting as it might affect search speed in Japanese text as well, for example? :)
man fzf
)Info
Problem / Steps to reproduce
I appreciate that this is not a common use case however I have noticed that fzf is many times slower than fzy when searching Arabic (and fzy's results are quite good).
Dataset
fzf
fzy
+x +i --algo=v1 --literal
really helps, but there remains a big difference