open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.9k stars 2.27k forks source link

[receiver/cloudfoundry] Some metric attributes need to be moved to be resource attributes #34824

Open crobert-1 opened 2 weeks ago

crobert-1 commented 2 weeks ago

Component(s)

receiver/cloudfoundry

Describe the issue you're reporting

An example metric from the Cloud Foundry receiver:

Metric #100
Descriptor:
     -> Name: system_metrics_agent.system_network_tcp_curr_estab
     -> Description: 
     -> Unit: 
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> org.cloudfoundry.index: Str(4b406f26-f1ec-4d66-8b92-f09ab6105d46)
     -> org.cloudfoundry.ip: Str(10.0.4.8)
     -> org.cloudfoundry.deployment: Str(cf-8148cfc6ff877dab3719)
     -> org.cloudfoundry.id: Str(4b406f26-f1ec-4d66-8b92-f09ab6105d46)
     -> org.cloudfoundry.job: Str(compute)
     -> org.cloudfoundry.product: Str(Small Footprint VMware Tanzu Application Service)
     -> org.cloudfoundry.instance_group: Str(compute)
     -> org.cloudfoundry.origin: Str(system_metrics_agent)
     -> org.cloudfoundry.system_domain: Str(sys.note-2482026.zf8b8cb76.shepherd.lease)
     -> org.cloudfoundry.source_id: Str(system_metrics_agent)
StartTimestamp: 2024-08-22 22:31:11.01441 +0000 UTC
Timestamp: 2024-08-22 22:31:29.45322044 +0000 UTC
Value: 51.000000

Each of these attributes are really describing the resource from where the metric and its data points are coming from, not the data point itself. These should all be moved to be resource attributes.

This would be a breaking change, and its changing the format of metrics considerably.

github-actions[bot] commented 2 weeks ago

Pinging code owners:

crobert-1 commented 2 weeks ago

@CemDK and @jriguera: Let me know if you have any feedback here.

jriguera commented 2 weeks ago

Yes, according with OTEL guidelines, those attributes belong to resource level. The reason why they are at datapoint/log_record level is because we are copying the tags from the envelope structure. We would need to have a list of attributes which should be moved to resource level, taking into consideration there are other custom tags defined by users/operators which I would keep as datapoint/log_record attributes . There are some attributres (eg org.cloudfoundry.product which do not appear in open source Cloud Foundry)

But (just to take into consideration), one could argue that because the component is targeting CF as a system, only attributes like org.cloudfoundry.deployment , org.cloudfoundry.product and org.cloudfoundry.system_domain would be resource level attributes.

Anyway, I would prefer having them as resource level. And, yes it will be a breaking change.

odubajDT commented 2 weeks ago

Hi, I would like to look into this issue if possible

Some thoughts about the breaking change: would it be a solution to temporary (let's say for 5 releases) keep the attributes duplicated on metric level and as well on resource level? Therefore there can be a slow phasing out time for users to adapt. WDYT?

odubajDT commented 2 weeks ago

Drafted a first possible (incomplete) draft solution, would like to hear your opinion on if duplicating the attributes is a good way in this. Thanks!

https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/34905

crobert-1 commented 2 weeks ago

Some thoughts about the breaking change: would it be a solution to temporary (let's say for 5 releases) keep the attributes duplicated on metric level and as well on resource level? Therefore there can be a slow phasing out time for users to adapt. WDYT?

I think a feature gate would probably be better for this scenario. Have the feature gate disabled by default, but if someone enables it, all of the relevant attributes will be resource attributes instead of (not duplicated) attributes of a datapoint.

I like the idea of documenting this though, and giving it a number of releases before upgrading the feature gate's stability.

odubajDT commented 2 weeks ago

Some thoughts about the breaking change: would it be a solution to temporary (let's say for 5 releases) keep the attributes duplicated on metric level and as well on resource level? Therefore there can be a slow phasing out time for users to adapt. WDYT?

I think a feature gate would probably be better for this scenario. Have the feature gate disabled by default, but if someone enables it, all of the relevant attributes will be resource attributes instead of (not duplicated) attributes of a datapoint.

I like the idea of documenting this though, and giving it a number of releases before upgrading the feature gate's stability.

Thanks for the response, adapted the implementation and should be ready for review now!

jriguera commented 2 weeks ago

I still need to review the code, but I agree with the feature gate.