Closed bobheadxi closed 1 year ago
FYI
I have been looking into OpenTelemetry as we can potentially leverage an existing standard for log data. The summary of my findings is available in this document.
From my findings, I think should re-shuffle here a bit. The push towards RED metrics is a useful initiative but is aligned with OpenTelemetryโs Tracing functional component. For Metrics, we should consider whether adopting the OpenTelemetry instruments (counter, measure, observer) is worth adopting.
Based on Michal's findings above, and our decision to focus on logging (events) and tracing components in particular first (especially in the context of OpenTelemetry), we are planning on deferring this indefinitely for the time being. Once we have more investment in OpenTelemetry integrations we can revisit this in the context of OpenTelemetry's recommendations for metrics.
cc @jhchabran @quinnhare
Did a little more digging around this; I acknowledge that the North Star is way more focused on logs+traces than metrics.
It appears that while OTel Metrics work is WIP, they're going to be treating as an important collaborator and will provide quality first-class integration. It's something I plan (we should) be keeping an eye on.
Problem to solve
Today, our services do not export uniform metrics. With various implementations and custom metrics, this makes it difficult to interpret what a user is actually experiencing when the metrics are being viewed by an individual, or take any meaningful action if it isn't clear what the metric represents. It's also been noted that custom metrics can be very expensive, which we'd like to avoid.
In order to accurately display what users are experiencing when using Sourcegraph and provide a way to determine service issues more easily, we should export metrics using the RED method.
Measure of success
Solution summary
We will propose a migration of any services not currently using the RED metrics package to provide better information for our users.
Artefacts:
What specific customers are we iterating on the problem and solution with?
Internal Sourcegraph developers, and those teams who consume these metrics most often (Customer Engineering, Delivery, Cloud DevOps).
Impact on use cases
This effort contributes to the company-wide effort to improve Observability.
Delivery plan
Tracked issues
@unassigned
Completed
@bobheadxi: 3.00d
Completed: 3.00d
Legend