sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.28k forks source link

sourcegraph.com alert: "gitserver: 100+ concurrent command executions (abnormally high load)" #9355

Closed slimsag closed 4 years ago

slimsag commented 4 years ago

On Sourcegraph.com, the new alerting has picked up the fact that we regularly exceed 100+ concurrent command executions on gitservers there, e.g. over a 6h period :

image

From here.

These appear to be occurring at regular 7 minute intervals, unfortunately we don't get much more information than that here. What could these be coming from? Any ideas @keegancsmith ?

I suspect this would cause issues or slowness for users if they try to e.g. run searches at the same time

slimsag commented 4 years ago

gitserver: echo command execution duration exceeding 1s

Is also regularly flapping, and may be related?

image

I wonder if the fact that gitserver disks are currently at 10% and are cleaning up old repos has anything to do with this? https://sourcegraph.slack.com/archives/CMBA8F926/p1585255501001700

keegancsmith commented 4 years ago

I'll investigate this further. We have honeycomb which we pay for and haven't used in months! It would be perfect to find out what is going on here.

keegancsmith commented 4 years ago

My internet is painfully slow at the moment and nearing the end of the workday for me. I highly recommend exploring our gitserver-exec dataset in honeycomb. https://ui.honeycomb.io/sourcegraph/datasets/gitserver-exec

uwedeportivo commented 4 years ago

Dear all,

This is your release captain speaking. 🚂🚂🚂

Branch cut for the 3.15 release is scheduled for tomorrow.

Is this issue / PR going to make it in time? Please change the milestone accordingly. When in doubt, reach out!

Thank you

bobheadxi commented 4 years ago

@keegancsmith and I recently noticed a lot of git activity in honeycomb that I'm assuming contributes towards the count that triggers this alert, for example in a 10-minute span:

image

@aileenrose took a look at analytics and it seems that the top repo in this span has barely any traffic via sourcegraph.com, so this might be coming from extensions or some other automated source of requests (ie unnecessary git commands in certain paths - https://github.com/sourcegraph/sourcegraph/issues/9359#issuecomment-661712351)

bobheadxi commented 4 years ago

Possibly related: https://github.com/sourcegraph/sourcegraph/pull/12379

bobheadxi commented 4 years ago
image

https://sourcegraph.com/-/debug/grafana/d/gitserver/git-server?viewPanel=5&orgId=1&from=now-24h&to=now

It looks like https://github.com/sourcegraph/sourcegraph/pull/13386 possibly fixed this!

keegancsmith commented 4 years ago

yeah this would be the root cause fixed :)

Fixed by #13386

bobheadxi commented 4 years ago

unsilencing in https://github.com/sourcegraph/deploy-sourcegraph-dot-com/pull/3303