sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.12k stars 1.29k forks source link

Automatically block auto-indexing on repeated failure #60916

Open varungandhi-src opened 8 months ago

varungandhi-src commented 8 months ago

We have a table codeintel_autoindexing_exceptions which can be manually modified to exclude certain repos from auto-indexing. For context, see:

However, the function introduced the PR is only called in test code. This means that when auto-indexing fails a lot of times for a certain repo, we will still keep creating jobs anyways, unless someone actually manually modifies the codeintel_autoindexing_exceptions table manually. For example, we just had an incident recently where a customer had 100K+ auto-indexing jobs, and the queue kept growing. This caused high CPU usage:

Our alert was trigger due to consistent high load in Cloud SQL. There is no immediate impact to customers, but:

1.5M unprocessed executor job queue

several very expensive queries (24s+) (from caller: internal/codeintel/autoindexing/internal/store.) that seem to be halting the postgres database. this is affecting other components in the deployment, e.g., worker is unable to process any jobs while database is at peak load.

Sub-parts:

varungandhi-src commented 8 months ago

Further optimization idea: