sourcegraph / zoekt

Fast trigram based code search
Apache License 2.0
598 stars 80 forks source link

index: use a random sample of ngrams when limiting #797

Closed keegancsmith closed 2 months ago

keegancsmith commented 2 months ago

The first bit of data I am getting back indicates this strategy of limiting the number of ngrams we lookup isn't working. I am still experimenting with different limits, but in the meantime it is easy to implement a strategy which picks a random subset. This is so that the first N ngrams of a query aren't the only ones being consulted.

Test Plan: ran all tests with the envvar set to 2. I expected tests that assert on stats to fail, but everything else to pass. This was the case.

SRC_EXPERIMENT_ITERATE_NGRAM_LOOKUP_LIMIT=2 go test ./...

Part of https://linear.app/sourcegraph/issue/CODY-3029/investigate-performance-of-guardrails-attribution-endpoint