Closed slimsag closed 3 years ago
Following some of the linked PRs, it seems like:
cc @sourcegraph/search
Here is the graph over the last 14d:
I'm not sure how this graph interacts with silences, but it seems that the day this issue was filed (24th) the warning alerts went up a bunch. The 29th I shipped some fixes and the big one on the 30th (and one or two more the next day or two). You can see the graph go mostly silent again. Then on the weekend we scaled up index search to be 200k instead of 100k repos and you can see on Monday (when the site has more activity) all the alerts start up again. I would suspect the root cause is that. cc @beyang
zoekt-indexserver still uses significant CPU, without seeming to affect the only other service metric available (average revision resolve duration)
Average revision resolve duration will is measuring an RPC call, so it is putting load on gitserver (via frontend). So lots of CPU use indicates it is likely indexing a lot. I would look at the recently added queue metrics.
zoekt-webserver still uses all of its memory, with frequently firing "50s+ indexed search request errors every 5m by code" alerts - this might be an issue of the alert being on a hard threshold rather than a ratio
The webserver uses a lot of memory even if it is not serving any requests. IE the memory use is dominated by the working set of indexes, not the results or number of them generated. At this scale we would need a lot of traffic for it to contribute to memory use over just holding the indexes in memory.
These alerts are firing frequently but may have already been addressed by Keegan:
"warning_zoekt_indexserver_provisioning_container_cpu_usage_5m", "warning_zoekt_indexserver_provisioning_container_cpu_usage_7d_high", "warning_zoekt_webserver_provisioning_container_cpu_usage_7d_high", "warning_zoekt_webserver_provisioning_container_memory_usage_7d_high",
Regardless, I have silenced them as they are still firing/noisy. please fix, confirm the alerts are not firing anymore, and then unsilence them: https://github.com/sourcegraph/deploy-sourcegraph-dot-com/blob/e33d7cd48e9407aac88124eec89644dd4d51699c/base/frontend/sourcegraph-frontend.ConfigMap.yaml#L5271-L5275