open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.64k stars 871 forks source link

Add ability to reset metrics with Otel #3985

Open anushkamittal2001 opened 2 months ago

anushkamittal2001 commented 2 months ago

What are you trying to achieve? feature of resetting internal metrics to clean its memory usage, this is supported by the underlying Prometheus library.

What did you expect to see?

OpenTelemetry should support resetting the in-memory metric vectors allowing to clean its memory usage. As this is supported by Prometheus to reset metric vectors and OpenTelemetry is using Prometheus client library hence, it would be very beneficial for us to consume this feature again in the Kyverno project.

Additional context. As mentioned the underlying https://github.com/kyverno/kyverno/issues/8401#issuecomment-1811924079 to clean the memory usage. Once we have this in specs, we can implement in this in opentelemetry-go.

Add any other context about the problem here. If you followed an existing documentation, please share the link to it.

https://github.com/open-telemetry/opentelemetry-go/issues/4752

MrAlias commented 2 months ago

Is this also a duplicate of https://github.com/open-telemetry/opentelemetry-specification/issues/2232?

tedsuo commented 2 months ago

Is this something that the SDK should be doing? Or is it specific to the prometheus exporter?

jmacd commented 2 months ago

@tedsuo This is a missing aspect of API, SDK, and Data model. I believe this is a duplicate of #2232. The OTel protocol added a metric datapoint flag for MISSING_DATA in order to accommodate Prometheus receivers which would otherwise report NaN-value points. However, there is no way for an OTel SDK to produce missing data, and a bunch of related questions have to be studied and answered here.

I'll share two of the concerns I know of.

  1. If an SDK creates a timeseries, reports some points, then deletes it, and then creates it again, what is the start timestamp of the second timeseries?
  2. If an SDK starts up, and then 3 days pass, and then a never-in-this-sdk-lifetime-seen combination of attributes yields to a new timeseries starting 3 days after the SDK started -- what is the start timestamp of this series?

I believe we need to arrange for at least one MISSING_DATA point to be written between successive restarts of a timeseries. I also believe we should find a way to use a "recent" timestamp as the start time of the timeseries when it is introduced, not use the SDK start time--the timestamp should be after the last reset, at the very least. I would prefer to use the timestamp of the "last scrape", except that is not necessarily well defined (b/c multiple readers will see it differently).

anushkamittal2001 commented 2 months ago

Hey folks, Yes, this discussion started with #2232. we could close this and continue the discussion there. I opened this ticket to focus on and discuss the ability to reset which already is supported by Prometheus.

I might not have understood something but @jmacd for 1 shouldnt the timestamp be whenever started i.e. whenever the timeseries is created again?

austinlparker commented 2 months ago

@jmacd if you're the sponsor for this issue, could you consolidate this (and other related issues to metrics reset) into a single issue and close the others? thanks