sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.27k forks source link

cadvisor picks up metrics from non-sourcegraph services #23810

Open abeatrix opened 3 years ago

abeatrix commented 3 years ago

cadvisor_container_memory_usage_percentage_total{name="prometheus-to-sd-exporter-event-exporter-gke-564fb97f9-sfkvq"} | +Inf
-- | --
cadvisor_container_memory_usage_percentage_total{name="prometheus-to-sd-kube-dns-6c7b8dc9f9-8x9sz"} | +Inf
cadvisor_container_memory_usage_percentage_total{name="prometheus-to-sd-kube-dns-6c7b8dc9f9-b97tv"} | +Inf

Steps to reproduce:

  1. Prometheus appears to be crashing with over 99% memory usage
  2. Check metrics picks up by cadvisor

Expected behavior:

metrics unrelated to Sourcegraph's services will not be collected by cadvisor

Actual behavior:

metrics unrelated to Sourcegraph's services is being picked up by cadvisor because the names started with the name of one of our services --prometheus

github-actions[bot] commented 3 years ago

Heads up @davejrt @ggilmore @daxmc99 @dan-mckean - the "team/distribution" label was applied to this issue.

bobheadxi commented 3 years ago

This is unfortunately inherent to how cAdvisor works. Here's a workaround: https://github.com/sourcegraph/customer/issues/75#issuecomment-678300758 (I don't think this is public-facing, I will add something). I've opened a pull request https://github.com/sourcegraph/sourcegraph/pull/23817 outlining this

And here's more context: https://github.com/sourcegraph/sourcegraph/issues/17365 (this comment also outlines some potential directions we were exploring/had explored at that time)

github-actions[bot] commented 2 years ago

Heads up @davejrt @ggilmore @dan-mckean @caugustus-sourcegraph @stephanx - the "team/delivery" label was applied to this issue.