sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.28k forks source link

Searcher: optimize AND/ OR patterns #59038

Closed jtibshirani closed 8 months ago

jtibshirani commented 9 months ago

Currently, if an unindexed search contains a pattern like test AND server, then the search jobs framework executes it by

  1. Creating two separate searcher jobs, one for the pattern test, and another for server
  2. Streaming the results back and taking a conjunction

This can be quite inefficient and use a lot of memory within the frontend service.

We're planning an update to the query language to interpret spaces as an AND between terms by default, as opposed to treating them literally. So this will result in many more unindexed AND searches. We should update searcher to just handle AND/ OR patterns directly, making the execution much more efficient.

/cc @sourcegraph/search-platform

jtibshirani commented 9 months ago

Implementation plan

Initial refactors and bug fixes:

Main implementation:

jtibshirani commented 9 months ago

I ran the benchmarks for the main searcher logic before and after these changes. Documenting that I didn't see any regression:

Before (commit a82617) ``` BenchmarkSearchRegex_rare_fixed-10 67 15303353 ns/op 15542221 B/op 617 allocs/op BenchmarkSearchRegex_large_fixed_casesensitive BenchmarkSearchRegex_large_fixed_casesensitive-10 228 5127657 ns/op 11425 B/op 159 allocs/op BenchmarkSearchRegex_large_re_dotstar BenchmarkSearchRegex_large_re_dotstar-10 3 507240514 ns/op 628485160 B/op 6098749 allocs/op BenchmarkSearchRegex_large_re_common BenchmarkSearchRegex_large_re_common-10 120 9848522 ns/op 7489471 B/op 68727 allocs/op BenchmarkSearchRegex_large_re_anchor BenchmarkSearchRegex_large_re_anchor-10 10 122277004 ns/op 4909972 B/op 50084 allocs/op BenchmarkSearchRegex_large_capture_group BenchmarkSearchRegex_large_capture_group-10 4 309301916 ns/op 3409316 B/op 25162 allocs/op BenchmarkSearchRegex_large_path BenchmarkSearchRegex_large_path/path_only BenchmarkSearchRegex_large_path/path_only-10 6390 159883 ns/op 1296 B/op 21 allocs/op BenchmarkSearchRegex_large_path/content_only BenchmarkSearchRegex_large_path/content_only-10 127 8933993 ns/op 61477 B/op 601 allocs/op BenchmarkSearchRegex_large_path/both_path_and_content BenchmarkSearchRegex_large_path/both_path_and_content-10 132 9476068 ns/op 61064 B/op 601 allocs/op BenchmarkSearchRegex_small_fixed BenchmarkSearchRegex_small_fixed-10 5766 240830 ns/op 214016 B/op 90 allocs/op BenchmarkSearchRegex_small_fixed_casesensitive BenchmarkSearchRegex_small_fixed_casesensitive-10 8626 139368 ns/op 5642 B/op 86 allocs/op BenchmarkSearchRegex_small_re_dotstar BenchmarkSearchRegex_small_re_dotstar-10 388 3393447 ns/op 4057457 B/op 40107 allocs/op BenchmarkSearchRegex_small_re_common BenchmarkSearchRegex_small_re_common-10 10000 131611 ns/op 45132 B/op 488 allocs/op BenchmarkSearchRegex_small_re_anchor BenchmarkSearchRegex_small_re_anchor-10 1252 837386 ns/op 20748 B/op 272 allocs/op BenchmarkSearchRegex_small_capture_group BenchmarkSearchRegex_small_capture_group-10 655 1973166 ns/op 17844 B/op 224 allocs/op ```
After (commit bcca3c) ``` BenchmarkSearchRegex_large_fixed BenchmarkSearchRegex_large_fixed-10 164 7417409 ns/op 15664359 B/op 208 allocs/op BenchmarkSearchRegex_rare_fixed BenchmarkSearchRegex_rare_fixed-10 100 10094732 ns/op 15631530 B/op 622 allocs/op BenchmarkSearchRegex_large_fixed_casesensitive BenchmarkSearchRegex_large_fixed_casesensitive-10 338 3623207 ns/op 11210 B/op 160 allocs/op BenchmarkSearchRegex_large_re_dotstar BenchmarkSearchRegex_large_re_dotstar-10 4 335343333 ns/op 628868916 B/op 6098780 allocs/op BenchmarkSearchRegex_large_re_common BenchmarkSearchRegex_large_re_common-10 159 7196733 ns/op 7489770 B/op 68733 allocs/op BenchmarkSearchRegex_large_re_anchor BenchmarkSearchRegex_large_re_anchor-10 14 80432702 ns/op 4921970 B/op 50090 allocs/op BenchmarkSearchRegex_large_capture_group BenchmarkSearchRegex_large_capture_group-10 5 218258692 ns/op 3410760 B/op 25168 allocs/op BenchmarkSearchRegex_large_path BenchmarkSearchRegex_large_path/path_only BenchmarkSearchRegex_large_path/path_only-10 7413 155205 ns/op 1325 B/op 22 allocs/op BenchmarkSearchRegex_large_path/content_only BenchmarkSearchRegex_large_path/content_only-10 156 7530265 ns/op 60840 B/op 602 allocs/op BenchmarkSearchRegex_large_path/both_path_and_content BenchmarkSearchRegex_large_path/both_path_and_content-10 154 7971877 ns/op 60840 B/op 602 allocs/op BenchmarkSearchRegex_small_fixed BenchmarkSearchRegex_small_fixed-10 7808 190921 ns/op 288483 B/op 92 allocs/op BenchmarkSearchRegex_small_fixed_casesensitive BenchmarkSearchRegex_small_fixed_casesensitive-10 10000 115307 ns/op 5499 B/op 87 allocs/op BenchmarkSearchRegex_small_re_dotstar BenchmarkSearchRegex_small_re_dotstar-10 440 2820992 ns/op 4110047 B/op 40111 allocs/op BenchmarkSearchRegex_small_re_common BenchmarkSearchRegex_small_re_common-10 9824 133797 ns/op 44966 B/op 489 allocs/op BenchmarkSearchRegex_small_re_anchor BenchmarkSearchRegex_small_re_anchor-10 1426 832451 ns/op 20625 B/op 273 allocs/op BenchmarkSearchRegex_small_capture_group BenchmarkSearchRegex_small_capture_group-10 649 1828686 ns/op 17671 B/op 225 allocs/op ```