open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.46k stars 1.47k forks source link

Migrate collector's metrics from OC to OpenTelemetry #816

Closed pavolloffay closed 10 months ago

pavolloffay commented 4 years ago

Currently, the collector uses OpenCensus API for internal observability https://github.com/open-telemetry/opentelemetry-collector/blob/0e1b2e323d398cf6cda500953c7896aa01c173b7/observability/observability.go#L25.

These metrics should be probably migrated to OpenTelemetry API.

Jaeger storage components use abstract metrics API and so we need to export these metrics in Jaeger OTEL collector e.g.

https://github.com/jaegertracing/jaeger/blob/bf870d9e2bffaf4758045919446be75ac9fc9783/cmd/opentelemetry-collector/app/exporter/cassandra/exporter.go#L31.

Are there plans to migrate to OpenTelemetry metrics? In Jaeger we want to avoid first creating OC adapter and then OTEL once this repo migrates. I can submit a PR to port the metrics to OTEL API.

yurishkuro commented 4 years ago

This would also provide the needed dogfooding for OTel SDK.

bogdandrutu commented 4 years ago

@yurishkuro completely agree, we are waiting for 2 things:

  1. The ability to use labels from the correlation-context.
  2. The ability to configure Views - allow us to configure which labels from the correlation-context to use.

If we don't have these two things we need to manually propagate/retrieve these values every time we do a report. Which may be possible if we do have couple of helper functions. So probably the main missing feature is to support changing the default aggregations for some instruments like for distributions we don't want just minmaxsumcount we want a histogram.

yurishkuro commented 4 years ago

Why do you need context labels in the collector? Sounds a bit like nice to have, not a showstopper.

bogdandrutu commented 4 years ago

I think I also tried to explain myself that we have the solution to use helper functions, but let me describe the design a bit:

And as I explained is not a showstopper, the current blocker is the ability to configure the aggregation for the distribution to be histogram vs minmaxsumcount, and probably a nice to have is the ability to select the labels for the aggregator (again not a showstopper).

bogdandrutu commented 4 years ago

@pavolloffay to answer your question, we do want to use opentelemetry metrics. probably tracing as well :)

pavolloffay commented 4 years ago

thanks @bogdandrutu. I can open a PR to port the metrics.

Is there any issue in opentelemetry-go for:

So probably the main missing feature is to support changing the default aggregations for some instruments like for distributions we don't want just minmaxsumcount we want a histogram.

bogdandrutu commented 4 years ago

I think the missing thing is this otep https://github.com/open-telemetry/oteps/pull/89

tigrannajaryan commented 4 years ago

@bogdandrutu I think I remember some recent work that you did on this and changed telemetry to use Otel API instead of OpenCensus. Is that correct? If so I can close this.

nilebox commented 4 years ago

Collector is still using OpenCensus in its self-observability pipeline: https://github.com/open-telemetry/opentelemetry-collector/blob/c8aac9e3b294281a50ee183084d9566f0e8a9d29/service/telemetry.go#L25

There is https://github.com/open-telemetry/opentelemetry-collector/pull/1833 switching traces to Otel API (not merged yet), but that still doesn't include metrics.

mxiamxia commented 3 years ago

The scope to resolve this issue is vague. Are we planing to only remove the OC metric reference in public API for unblocking Tracing GA? Because it's hard to replace all OC metrics in all those metrics components in core and contrib repos before tracing GA timeframe.

Need help to elaborate the exit criteria/scope on this issue? And we can contribute for the changes :)

alolita commented 3 years ago

@bogdandrutu since you mentioned that this issue is not a showstopper can it be moved to a post-GA phase 3 backlog?

alolita commented 3 years ago

@bogdandrutu: We will evaluate all the core components for any public API dependence on OC metrics references and respond on issue.

mxiamxia commented 3 years ago

Discussed it in SIG with @tigrannajaryan and @bogdandrutu , this issue is currently blocked by open-telemetry/oteps#89.

alolita commented 3 years ago

This item has a dependency on the Go SDK supporting metrics in a stable state.

cc: @Aneurysm9 @MrAlias

MrAlias commented 3 years ago

This item has a dependency on the Go SDK supporting metrics in a stable state.

cc: @Aneurysm9 @MrAlias

Understood. For the full dependency map it is important to know we are block on supporting metrics in a stable state by OTel itself having a stable metric specification.

mtwo commented 3 years ago

@alolita I assume that this doesn't block the 1.0 release of the Collector, as it's metrics-related?

tigrannajaryan commented 3 years ago

@djaglowski this is the issue for migrating from OC to Otel. A small progress was made recently on this via https://github.com/open-telemetry/opentelemetry-collector/pull/3816 but the bigger change will be removing the dependency on OC metrics in obsreport. That work is not started.

As far as I know we do not yet have a plan for the custom metrics should be reported (e.g. metric names) either. It will probably happen as part of https://github.com/open-telemetry/opentelemetry-collector/issues/2430

bogdandrutu commented 2 years ago

This issue is still block by the otel-go library being ready.

bogdandrutu commented 2 years ago

Block by https://github.com/open-telemetry/opentelemetry-go/issues/2204

yurishkuro commented 2 years ago

@bogdandrutu https://github.com/open-telemetry/opentelemetry-go/issues/2204 is about OpenCensus bridge, why is it a concern? I think the path would be to rewrite instrumentation to natively talk to OTEL-Go API, not to continue going through OC API.

fatsheep9146 commented 2 years ago

@bogdandrutu open-telemetry/opentelemetry-go#2204 is about OpenCensus bridge, why is it a concern? I think the path would be to rewrite instrumentation to natively talk to OTEL-Go API, not to continue going through OC API.

@yurishkuro

Since @dashpole provided a mature propose that the metrics migration would use OpenCensus bridge for OpenTelemetry, like trace. https://github.com/open-telemetry/opentelemetry-collector/pull/3816#pullrequestreview-733106917

As an alternative, I would propose that we use both opencensus and opentelemetry client libraries at the same time while we migrate. We can use the OpenCensus bridge for OpenTelemetry to make both opencensus and opentelemetry clients serve metrics on the same prometheus endpoint. We could roll this out with the following steps:

  • Switch from the OpenCensus prometheus exporter to a "wrapped" OpenTelemetry prometheus exporter. Verify that self obs metrics on the prometheus endpoint remain the same.
  • Add the OpenTelemetry API/SDK and use the same prometheus exporter that OC uses.
  • Migrate each metric from OC to OTel individually, and verify that the metric doesn't change.

But this propose is blocked by issue #2204, the prometheus exporter does not satisfy the requirement for OpenCensus metric bridge for OpenTelemetry, which is described in https://pkg.go.dev/go.opentelemetry.io/otel/bridge/opencensus@v0.21.0#readme-metrics

jpkrohling commented 2 years ago

We discussed this a couple of SIG calls ago, and @jaronoff97 was set to check what's our current state by working on a couple of components first and reporting back.

jaronoff97 commented 2 years ago

@jpkrohling Our team is currently looking in to this. @paivagustavo has been working on what a migration may look like and has already confirmed that we can provide the same prometheus metrics (and histogram buckets) with the latest Go SDK release. I'll let Gustavo provide some more details.

paivagustavo commented 2 years ago

I've been working on testing the latest released otel go sdk and verifying that it would met the requirements here. As part of this I've instrumented the existing receiver metrics from obsreport.Receiver(#6222) and the loadbalancingexporter from contrib, more specifically testing that we could use different histogram buckets for specific metrics. Using the views api, we are able to accomplish that.

About the bridge, as it is in review at the moment (https://github.com/open-telemetry/opentelemetry-go/pull/3207), I would suggest us to start to double instrumenting the collector with otel go in the meantime. If the bridge is merged and released before we finish migrating the current metrics, we can enable it and start removing oc metrics that have already being instrumented with otel-go.