sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.28k forks source link

Investigate frequently triggered alert "repo-updater: 1+ repositories schedule error rate for 15m0s" #30739

Closed filiphaftek closed 8 months ago

filiphaftek commented 2 years ago

Steps to reproduce:

  1. This happens instantly on Cloud production - we have increased number of errors: repo-updater: 1+ repositories schedule error rate for 15m0s.
  2. The chart shows that this has happened since 21.01.2022 - Grafana chart
  3. It is correlated with increased gitserver IO writes - Grafana chart.
  4. It is correlated with increased gitserver CPU usage - Grafana chart.
  5. It is correlated with increased gitserver mean time to first results sent - Grafana chart

Expected behavior:

The error rate should be below accepted number, as before 21.01.2022, otherwise we could not spot real issue on production early enough.

Actual behavior:

  1. The alerts are fired few times a week - the time rate was increased from 15m to 25m, but this just made the alerts not to fire so often.
github-actions[bot] commented 2 years ago

Heads up @jplahn @dan-mckean - the "team/repo-management" label was applied to this issue.