open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.72k stars 2.15k forks source link

groupbyattrsprocessor drops metric metadata #33419

Open braydonk opened 3 weeks ago

braydonk commented 3 weeks ago

Component(s)

processor/groupbyattrs

What happened?

Description

When groupbyattrsprocessor makes a new metric, it does not copy the metric metadata. I assume this is because Metadata is a relatively new field and isn't fully respected everywhere yet.

Steps to Reproduce

Found this by testing the new untyped metric support in Prometheus. It adds a Metadata key called prometheus.type. So that's the easiest way to see the effect. Create a pipeline from prometheusreceiver to groupbyattrsprocessor to debugexporter and have it scrape some manner of metrics.

Expected Result

The prometheus.type metadata should still be present when the value is seen in debugexporter.

Actual Result

It's gone.

Other notes

Collector version

v0.102.0

Environment information

Environment

OS: Debian 12 Compiler(if manually compiled): go 1.22.3

OpenTelemetry Collector configuration

No response

Log output

Nothing of note.

Additional context

Should there be an actual API in pdata for making full metric copies like this? What the groupbyattrsprocessor has to do here is pretty brittle for exactly this reason, and I'm not sure if there are other processors doing something similar.

github-actions[bot] commented 3 weeks ago

Pinging code owners:

braydonk commented 3 weeks ago

FYI @dashpole @ridwanmsharif

dashpole commented 3 weeks ago

There is a CopyTo function for metrics, but i'm not sure if that is what is needed here.

braydonk commented 3 weeks ago

For reference, this is the spot that does not copy metadata: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/dd2e45aed3797b14a7b9723c5064141553004566/processor/groupbyattrsprocessor/processor.go#L204-L208

rnishtala-sumo commented 2 weeks ago

@braydonk looks like this could done using - https://pkg.go.dev/go.opentelemetry.io/collector/pdata@v1.9.0/pmetric#Metric.CopyTo

as suggested previously. I think the ask here makes sense. Please go ahead If you'd like to make the change. I can review the changes when ready.

crobert-1 commented 2 weeks ago

Removing needs triage based on code owner feedback.

odubajDT commented 6 days ago

If nobody is working on this issue, I would like to take it cc @evan-bradley

odubajDT commented 5 days ago

@braydonk looks like this could done using - https://pkg.go.dev/go.opentelemetry.io/collector/pdata@v1.9.0/pmetric#Metric.CopyTo

as suggested previously. I think the ask here makes sense. Please go ahead If you'd like to make the change. I can review the changes when ready.

After some investigation, using Metric.CopyTo() function is problematic here. The function copies also the datapoints from the original metric which goes exactly against the purpose of this processor, where datapoints are moved in between metrics. Therefore when creating new metric here (with empty datapoint slice) and using the CopyTo() function with it will lead to adding datapoints which are in the input and should not be present on output.

Therefore I would suggest adding the missing CopyTo() function only for metadata here unless we want to do a major refactoring of the code of the processor.

I am opened for suggestions

braydonk commented 4 days ago

Thanks for taking on this issue!

Might be naive, but perhaps the datapoints could just be deleted from the copy of the metric? If that won't work then it's fine to just do the metadata copy. Just would be nice if the full metric CopyTo function would work just to avoid this kind of thing happening again.

odubajDT commented 4 days ago

Thanks for taking on this issue!

Might be naive, but perhaps the datapoints could just be deleted from the copy of the metric? If that won't work then it's fine to just do the metadata copy. Just would be nice if the full metric CopyTo function would work just to avoid this kind of thing happening again.

This should work, but when deleting datapoints, we still need to first find out, what type are we dealing with (sum/gauge/histogram...) and firstly then delete the appropiate datapoints (these are also different types - numeric value/ histogram value/...).

Therefore I do not see any improvement in the logic here, since still the logic determining the type for the metric needs to stay in place and instead of creating empty metrics, we will copy them and them delete parts of them.

braydonk commented 3 days ago

Ah okay I understand the issue now. Probably fine to just use the metadata copy then; it's probably sufficiently rare for that metric proto to change much for this kind of thing to happen again. Thanks!