sourcegraph / zoekt

Fast trigram based code search
Apache License 2.0
736 stars 83 forks source link

index: experiment to limit ngram lookups for large snippets #795

Closed keegancsmith closed 3 months ago

keegancsmith commented 3 months ago

This introduces an experiment where we can stop looking up ngrams at a certain limit. The insight here is that for large substrings we spend more time finding the smallest ngram frequency than the time a normal search takes. So instead we can try and find a good balance between looking for a good (two) ngrams and actually searching the corpus.

The plan is to set different values for SRC_EXPERIMENT_ITERATE_NGRAM_LOOKUP_LIMIT in sourcegraph production and see how it affects performance of attribution search service.

Test Plan: ran all tests with the envvar set to 2. I expected tests that assert on stats to fail, but everything else to pass. This was the case.

SRC_EXPERIMENT_ITERATE_NGRAM_LOOKUP_LIMIT=2 go test ./...

Related to https://linear.app/sourcegraph/issue/CODY-3029/investigate-performance-of-guardrails-attribution-endpoint