sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.12k stars 1.29k forks source link

All Sourcegraph components can export standardized sets of metrics #33241

Closed bobheadxi closed 1 year ago

bobheadxi commented 2 years ago

Problem to solve

Today, our services do not export uniform metrics. With various implementations and custom metrics, this makes it difficult to interpret what a user is actually experiencing when the metrics are being viewed by an individual, or take any meaningful action if it isn't clear what the metric represents. It's also been noted that custom metrics can be very expensive, which we'd like to avoid.

In order to accurately display what users are experiencing when using Sourcegraph and provide a way to determine service issues more easily, we should export metrics using the RED method.

Measure of success

Solution summary

We will propose a migration of any services not currently using the RED metrics package to provide better information for our users.

Artefacts:

What specific customers are we iterating on the problem and solution with?

Internal Sourcegraph developers, and those teams who consume these metrics most often (Customer Engineering, Delivery, Cloud DevOps).

Impact on use cases

This effort contributes to the company-wide effort to improve Observability.

Delivery plan

Tracked issues

@unassigned

Completed

@bobheadxi: 3.00d

Completed: 3.00d

Legend

vrto commented 2 years ago

FYI

I have been looking into OpenTelemetry as we can potentially leverage an existing standard for log data. The summary of my findings is available in this document.

From my findings, I think should re-shuffle here a bit. The push towards RED metrics is a useful initiative but is aligned with OpenTelemetryโ€™s Tracing functional component. For Metrics, we should consider whether adopting the OpenTelemetry instruments (counter, measure, observer) is worth adopting.

bobheadxi commented 2 years ago

Based on Michal's findings above, and our decision to focus on logging (events) and tracing components in particular first (especially in the context of OpenTelemetry), we are planning on deferring this indefinitely for the time being. Once we have more investment in OpenTelemetry integrations we can revisit this in the context of OpenTelemetry's recommendations for metrics.

cc @jhchabran @quinnhare

vrto commented 2 years ago

Did a little more digging around this; I acknowledge that the North Star is way more focused on logs+traces than metrics.

It appears that while OTel Metrics work is WIP, they're going to be treating as an important collaborator and will provide quality first-class integration. It's something I plan (we should) be keeping an eye on.