[research] Consider SSE2/SSE4.2 and/or suffix arrays for a faster substring matching

balta2ar commented 8 years ago

Hello! First of all, I've said it before and I can't help but say it again: thank you for the great piece of software!

We all know that fzf is amazingly fast already. However, maybe there is still space left for optimization? Below I'm bringing your attention to two approaches that may work out well.

1. Suffix array

This approach is algorithmic. A lot of information can be found on the Internet, but I personally was introduced to suffix arrays in this course: https://www.coursera.org/learn/algorithms-on-strings

Here is another interesting article that demonstrates how to use suffix arrays in go. Luckily, they are implemented in the standard library! This one is definitely worth reading: http://eli.thegreenplace.net/2016/suffix-arrays-in-the-go-standard-library/

From what I understand, in this approach it's time-consuming to prepare the data before the first use. Also, the benefits reveal themselves when data gets larger. Considering this, maybe suffix arrays could be used when two conditions hold: 1. in interactive mode 2. when input data is larger than a certain threshold. An as example, I can say that filtering my locate database is... well, not blazingly fast.

2. SSE2/SSE4.2

The idea is to perform comparison of multiple DWORDs at once by using xmm registers. This approach is more hardcore, to my opinion, it's technically more difficult and limited in its possible use cases due to variations in current implementation of NaiveMatch (by variations I mean forward argument, but maybe it's not an issue).

Some articles to whet you appetite: http://0x80.pl/articles/sse4_substring_locate.html http://www.codeproject.com/Articles/383185/SSE-accelerated-case-insensitive-substring-search https://www.strchr.com/strcmp_and_strlen_using_sse_4.2 https://mischasan.wordpress.com/2011/07/16/convergence-sse2-and-strstr/

Some more papers. Hopefully, they go into more details: http://www.dmi.unict.it/~faro/papers/conference/faro32.pdf http://arxiv.org/pdf/1209.6449.pdf http://www.joics.com/publishedpapers/2013_10_18_5867_5880.pdf http://www2.compute.dtu.dk/~phbi/files/publications/2011opsmC.pdf

And some implementations in the wild: https://github.com/lattera/glibc/blob/a2f34833b1042d5d8eeb263b4cf4caaea138c4ad/sysdeps/x86_64/multiarch/strcmp-sse42.S http://opensource.apple.com/source/Libc/Libc-498.1.1/i386/string/strlen.s

Conclusion

I've marked the issue with [research] because I don't think there is an urgent need to make things even faster here. This is rather for our own education and fun. And discussion, of course! Maybe you or some other contributor find these ideas worthwhile and applicable to fzf. I'd love to hear what you think about this, @junegunn!

감사합니다. Thank you!

junegunn commented 8 years ago

Thanks, I'll take a look when I get some time.

By the way, a few days ago I did some profiling with large input and noticed that a significant amount of time is spent allocating memory for rune arrays necessary to handle Unicode characters. I'm planning to see if there's room for improvement there first.

balta2ar commented 8 years ago

I can also note that --ansi option contributes non-negligible amount to execution time.

junegunn commented 8 years ago

That's right. It's partly due to non-stellar performance of Go regex engine. --nth and --with-nth are also performance killers that you don't want to use with large input, which is understandable to some degree as they necessitate extra processing.

balta2ar commented 8 years ago

Speaking of performance, an interesting way to look at pprof results are flamegraphs: torch

balta2ar commented 8 years ago

Here is an interesting write up by the author of ripgrep — grep tool written in Rust. The author claims that this tool is faster than any other grep out there: http://blog.burntsushi.net/ripgrep/ Besides of being technically detailed in the benchmark section, I think special interest deserves Teddy algorithm:

Teddy is a simd accelerated multiple substring matching algorithm

And Intel's RegExp library: https://github.com/01org/hyperscan

Hacker news discussion: https://news.ycombinator.com/item?id=12564442 Reddit: https://www.reddit.com/r/programming/comments/544hos/ripgrep_is_faster_than_grep_ag_git_grep_ucg_pt/

junegunn commented 8 years ago

Thanks for the links. I believe there's a difference between fzf and the grep family, that is, the size of haystack is usually much smaller in fzf, because fzf searches for a pattern in each line, while grep searches for the pattern in each file which can be huge. So we should lower our expectation on the speed-up from the algorithms used in grep-like tools. And that's why I first focused on optimizing the code that wraps around the search in #637, and we got nice improvement.

You still using --exact as the default? Exact match I believe is quite fast since #637. I even updated it to search for the occurrence with the highest score in 0.15 but it's still fast.

Regarding fuzzy matcher, there have been a lot of work in accelerating Smith-Waterman algorithm with SIMD instructions (some of them are patented):

https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm#Accelerated_versions

Not sure if those approaches can be applied to fzf because of the customizations made in fzf that makes it harder to parallelize operations.

balta2ar commented 8 years ago

You still using --exact as the default?

I'm using exact search in a shell with z, for command history search (Ctrl-r) and for locate filtering (as you suggested, now I pass some initial filter to locate, and then filter with fzf, e.g. locate machine F, where F=| fzf). For all those cases exact search usually narrows down the results better for me. Or maybe it's just an old habit.

In vimrc, however, I switch to extended search mode and --algo=v2 explicitly as you suggested. In nvim I mostly search for files with fzf and the new algo=v2 is really comfortable for me. Being slightly disappointed by the low speed of Unite's grep (and while waiting for Denite to be ready), I've found myself using fzf + ag (maybe I should try ripgrep, but ag itself is fast enough for my code base) in nvim as well. On top of that, sometimes I use :Tags, :Lines and :BLines.

Generally, I'm very happy with both exact and new fuzzy v2 modes in fzf. They both have found their uses in my workflows.

junegunn commented 8 years ago

Whatever works best for you. Thanks for the feedback.

Although I also use Ag command from fzf.vim project, quite regularly, the approach has more to be desired as we can't use fzf like native quickfix list. Haven't tried it but I've heard good things about https://github.com/mhinz/vim-grepper.

junegunn / fzf