open-telemetry / opentelemetry-go-contrib

Collection of extensions for OpenTelemetry-Go.
https://opentelemetry.io/
Apache License 2.0
1.14k stars 539 forks source link

Improvements to the instrumentation/runtime package #2624

Open jmacd opened 2 years ago

jmacd commented 2 years ago

Background

Go-1.16 introduced a new runtime/metrics package that provides official metric names for all of the Go runtime properties worthy of monitoring. Ideally, I think we should migrate from the current instrumentation/runtime metrics to use exactly the names given by the Go team. This will require deprecation of the old names, which is not directly supported by OTel's schema migration support.

The runtime/metrics API is well suited to OTel's asynchronous instrument model, except for the appearance of 4 histograms (1 of them a Gauge histogram). These issues are being tracked at https://github.com/open-telemetry/opentelemetry-specification/issues/2713 and https://github.com/open-telemetry/opentelemetry-specification/issues/2714.

For the non-histogram metric values, these could easily be turned into OTel asynchronous instruments and callbacks. However, this looks like an area where users may want some configurability. Which would users prefer to see:

  1. One package that has all runtime metrics, and another package that has just the essentials?
  2. One package that has all runtime metrics, where many of them are disabled by default?

I think I would prefer the second option. Ideally, I would like to see OTel specify "API Hints" for instrumentation to supply their own default Views. This would allow an instrumentation package to instrument more than is enabled by default. See https://github.com/open-telemetry/opentelemetry-specification/issues/2229

jmacd commented 2 years ago

In Go-1.19 I used this code to print the 1.19 runtime metrics. What I like about this approach is that the Go team is able to introduce new metrics without modifying this code.

Note that all non-cumulative non-histogram values are printed as UpDownCounter, not Gauge. None of the 1.19 runtime metrics are true gauges, they are all counts, however this could change in the future.

// Name                                                    Unit         Instrument
// -----------------------------------------------------------------------------------------
// process.runtime.go.cgo.go-to-c-calls                    {calls}      Counter[int64]
// process.runtime.go.gc.cycles.automatic                  {gc-cycles}  Counter[int64]
// process.runtime.go.gc.cycles.forced                     {gc-cycles}  Counter[int64]
// process.runtime.go.gc.cycles.total                      {gc-cycles}  Counter[int64]
// process.runtime.go.gc.heap.allocs.bytes                 (*)          Counter[int64]
// process.runtime.go.gc.heap.allocs.objects               (*)          Counter[int64]
// process.runtime.go.gc.heap.allocs-by-size               {bytes}      Histogram[float64]      (**)
// process.runtime.go.gc.heap.frees.bytes                  (*)          Counter[int64]
// process.runtime.go.gc.heap.frees.objects                (*)          Counter[int64]
// process.runtime.go.gc.heap.frees-by-size                {bytes}      Histogram[float64]      (**)
// process.runtime.go.gc.heap.goal                         {bytes}      UpDownCounter[int64]
// process.runtime.go.gc.heap.objects                      {objects}    UpDownCounter[int64]
// process.runtime.go.gc.heap.tiny.allocs                  {objects}    Counter[int64]
// process.runtime.go.gc.limiter.last-enabled              {gc-cycle}   UpDownCounter[int64]
// process.runtime.go.gc.pauses                            {seconds}    Histogram[float64]      (**)
// process.runtime.go.gc.stack.starting-size               {bytes}      UpDownCounter[int64]
// process.runtime.go.memory.classes.heap.free             {bytes}      UpDownCounter[int64]
// process.runtime.go.memory.classes.heap.objects          {bytes}      UpDownCounter[int64]
// process.runtime.go.memory.classes.heap.released         {bytes}      UpDownCounter[int64]
// process.runtime.go.memory.classes.heap.stacks           {bytes}      UpDownCounter[int64]
// process.runtime.go.memory.classes.heap.unused           {bytes}      UpDownCounter[int64]
// process.runtime.go.memory.classes.metadata.mcache.free  {bytes}      UpDownCounter[int64]
// process.runtime.go.memory.classes.metadata.mcache.inuse {bytes}      UpDownCounter[int64]
// process.runtime.go.memory.classes.metadata.mspan.free   {bytes}      UpDownCounter[int64]
// process.runtime.go.memory.classes.metadata.mspan.inuse  {bytes}      UpDownCounter[int64]
// process.runtime.go.memory.classes.metadata.other        {bytes}      UpDownCounter[int64]
// process.runtime.go.memory.classes.os-stacks             {bytes}      UpDownCounter[int64]
// process.runtime.go.memory.classes.other                 {bytes}      UpDownCounter[int64]
// process.runtime.go.memory.classes.profiling.buckets     {bytes}      UpDownCounter[int64]
// process.runtime.go.memory.classes.total                 {bytes}      UpDownCounter[int64]
// process.runtime.go.sched.gomaxprocs                     {threads}    UpDownCounter[int64]
// process.runtime.go.sched.goroutines                     {goroutines} UpDownCounter[int64]
// process.runtime.go.sched.latencies                      {seconds}    GaugeHistogram[float64] (**)
//
// (*) Empty unit strings are cases where runtime/metric produces
// duplicate names ignoring the unit string; here we leave the unit in the name
// and set the unit to empty.
// (**) Histograms are not currently implemented, see the related
// issues for an explanation:
jmacd commented 1 year ago

For what it's worth, we are happy with https://github.com/lightstep/otel-launcher-go/tree/main/lightstep/instrumentation/runtime and are happy to help return that code, which was forked, to this repository.

One thing is we are still missing asynchronous histogram support, which could be accomplished now using MetricProducer.

pellared commented 8 months ago

There is no guarantee if the instruments provided by this package will be stable. Each version (and implementation of Go) can produce different metrics. Right now the https://pkg.go.dev/runtime/metrics doc even says that "The set of metrics defined by this package may evolve as the runtime itself evolves". The runtime package in Go Contrib should follow semconv and we should do our best to offer telemetry stability.

As for now, I think it is better to have it in a separate repository and it can be added to OTel Registry.

Can we close the issue?