sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.27k forks source link

WIP Proposal: Remove unneeded metrics #5201

Open uwedeportivo opened 5 years ago

uwedeportivo commented 5 years ago

prom-metrics-doc generates a list of all metrics with corresponding documentation.

We have a couple of instances where we use counters and histograms for the same event. Histograms have an internal counter we can use and get rid of the additional counter.

Inventory of metrics related to http requests

pkg/metrics/metrics.go

Used to measure requests to external services (bitbucket, github, etc) from the client POV (http client instrumentation). Currently consists of counter vector (breakdown by http status code + special case for error). PR 5093 adds request duration histogram vector.

Example metrics: repo-updater (this includes PR 5093)

# HELP src_github_request_duration_seconds Time (in seconds) spent on request.
# TYPE src_github_request_duration_seconds histogram
src_github_request_duration_seconds_bucket{category="graphql",code="200",le="0.005"} 0
src_github_request_duration_seconds_bucket{category="graphql",code="200",le="0.01"} 0
...
src_github_request_duration_seconds_bucket{category="graphql",code="200",le="10"} 8
src_github_request_duration_seconds_bucket{category="graphql",code="200",le="+Inf"} 8
src_github_request_duration_seconds_sum{category="graphql",code="200"} 3.1095035149999997
src_github_request_duration_seconds_count{category="graphql",code="200"} 8
# HELP src_github_requests_total Total number of requests sent to the GitHub API.
# TYPE src_github_requests_total counter
src_github_requests_total{category="graphql",code="200"} 8

cmd/frontend/backend/trace.go

Used by cmd/frontend/backend/repos.go (for example func (s *repos) AddGitHubDotComRepository ).

Example metrics: frontend

src_backend_client_request_duration_seconds_bucket{method="Repos.GetByName",success="false",le="0.2"} 5
src_backend_client_request_duration_seconds_bucket{method="Repos.GetByName",success="false",le="0.5"} 5
src_backend_client_request_duration_seconds_bucket{method="Repos.GetByName",success="false",le="1"} 5
src_backend_client_request_duration_seconds_bucket{method="Repos.GetByName",success="false",le="2"} 5

pkg/trace/httptrace.go

Woven into trace.Middleware(). Used only by cmd/frontend/internal/cli.newExternalHTTPHandler() to create the external http handler for frontend.

Example metrics: frontend

src_http_request_duration_seconds_bucket{code="200",method="get",repo="unknown",route="search",le="2"} 1
src_http_request_duration_seconds_bucket{code="200",method="get",repo="unknown",route="search",le="5"} 1
src_http_request_duration_seconds_bucket{code="200",method="get",repo="unknown",route="search",le="10"} 1
src_http_request_duration_seconds_bucket{code="200",method="get",repo="unknown",route="search",le="30"} 1
src_http_request_duration_seconds_bucket{code="200",method="get",repo="unknown",route="search",le="+Inf"} 1

cmd/repo-updater/repoupdater/observability.go

Created by repoupdater.NewHandlerMetrics() wraps handlers (server POV in repo-updater).

Example metrics: repo-updater

src_repoupdater_http_handler_duration_seconds_bucket{code="200",path="/status-messages",le="0.01"} 2150
src_repoupdater_http_handler_duration_seconds_bucket{code="200",path="/status-messages",le="0.025"} 2150
src_repoupdater_http_handler_duration_seconds_bucket{code="200",path="/status-messages",le="0.05"} 2150
src_repoupdater_http_handler_duration_seconds_bucket{code="200",path="/status-messages",le="0.1"} 2150
src_repoupdater_http_handler_duration_seconds_bucket{code="200",path="/status-messages",le="0.25"} 2150
src_repoupdater_http_handler_duration_seconds_bucket{code="200",path="/status-messages",le="0.5"} 2150
slimsag commented 5 years ago

I don't think anyone will object to removing duplicated metrics :) For those you can just send PRs without proposals.

For metrics that aren't duplicate but you believe aren't useful for some reason, those may warrant a discussion (a PR, if easy to create, can also be a good way to discuss such changes)

uwedeportivo commented 4 years ago

we could do one round of this after @slimsag excellent work in observability. we could scrape for metrics that are on deprecated and/or deleted panels and delete those from the code. this will help with the prometheus data volume too.

github-actions[bot] commented 2 years ago

Heads up @davejrt @ggilmore @dan-mckean @caugustus-sourcegraph @stephanx - the "team/delivery" label was applied to this issue.