This introduces an experiment where we can stop looking up ngrams at a certain limit. The insight here is that for large substrings we spend more time finding the smallest ngram frequency than the time a normal search takes. So instead we can try and find a good balance between looking for a good (two) ngrams and actually searching the corpus.
The plan is to set different values for
SRC_EXPERIMENT_ITERATE_NGRAM_LOOKUP_LIMIT in sourcegraph production and see how it affects performance of attribution search service.
Test Plan: ran all tests with the envvar set to 2. I expected tests that assert on stats to fail, but everything else to pass. This was the case.
SRC_EXPERIMENT_ITERATE_NGRAM_LOOKUP_LIMIT=2 go test ./...
This introduces an experiment where we can stop looking up ngrams at a certain limit. The insight here is that for large substrings we spend more time finding the smallest ngram frequency than the time a normal search takes. So instead we can try and find a good balance between looking for a good (two) ngrams and actually searching the corpus.
The plan is to set different values for SRC_EXPERIMENT_ITERATE_NGRAM_LOOKUP_LIMIT in sourcegraph production and see how it affects performance of attribution search service.
Test Plan: ran all tests with the envvar set to 2. I expected tests that assert on stats to fail, but everything else to pass. This was the case.
SRC_EXPERIMENT_ITERATE_NGRAM_LOOKUP_LIMIT=2 go test ./...
Related to https://linear.app/sourcegraph/issue/CODY-3029/investigate-performance-of-guardrails-attribution-endpoint