Closed sigilioso closed 6 months ago
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Pinging code owners for receiver/statsd: @jmacd @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.
Hi @jmacd @dmitryax,
As I see it, the described scenario may be pretty common: several producers report the same metrics without a specific attribute to identify them. Could you take a look and confirm if the aggregation is incorrect? Please, Let me know if you need additional details.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
I've run into the same problem.
statsdreciver
does not aggregate messages correctly and basically causes the metric to be exported as a single unaggregated value. For instance, if I send 10 messages like this:
"test.metricA:5|c|#gate:otel"
within 10 seconds, I expect the block of 10 seconds to have a value of 50. otel statsd receiver reports 5.
The larger the aggregation time span, the larger the error gets. As is, this receiver is unusable, because even at aggregation of 1s, it still looses values.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
My team has the same problem as well. The un-aggregated metrics flow directly into the awsemf exporter, which causes a conflict as the exporter doesn't allow duplicate metrics in its batch, dropping all but one of the metrics before export. This is a confirmed issue in both docker (using docker compose and netcat) and kubernetes (Kong Statsd exporter -> ADOT)
I think this is the same root case (https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/29508) that I can reproduce this problem. Please check https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/30228 to see if that helps.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
This issue has been closed as inactive because it has been stale for 120 days with no activity.
Is it correct that this should be closed instead of getting fixed?
I'm not a statsd expert myself. Though I have mixed feeling regarding fix proposed here + #30228 If I read the code correct as long as a persistent UDP connection would be used metrics would be aggregated correctly (?) By implementing #30228 the aggregation by addr gets pointless and could just be removed. As now all metrics are aggregated by the statsdreceivers endpoint addr.
Let's take Telegraf's implementaiton as an example.
For reference here they store and buffer the package input: link
What we see during processing is that unless the metric uses the datadog statsd format the connection addr is completely ignored. If datadog statsd format is used the connection addr is set as the source
tag.
I would argue applications should expose tags which identify themselves. Still the aggregation by connection source can be useful.
Long story short. I propose to make this addr aggregation optional by a configuration flag. And also adding a resource attribute for the source to make the resulting aggregation more clear.
@jmacd , @dmitryax I have created PR #33010 implementing the suggestion. Could you please reopen this issue and have a look at the PR?
Component(s)
receiver/statsd
What happened?
Description
The metrics reported using the
statsd
receiver are not correctly aggregated when they come from different client addresses and the metrics are not decorated with any attribute to identify the client. Instead of aggregating the metrics with the same type and the same set of attribute keys and values as described in the docs, the receiver aggregates the metrics with the same type and the same set of attribute keys and values, and also the same origin address. But it does not include any attribute to identify the origin address.As I see it, this is a common scenario, where metrics coming from different producers (or even the same producer but using different connections), do not include specific attributes to identify the producer.
If the
k8sattributesprocessor
is used to decorate the metrics, this is not an issue since metrics will have attributes identifying the origin but using the processor is not an option for every environment neither is mandatory when using the receiver.It is also possible to avoid this problem by decorating the metrics in the client, but as stated in #15290, that can be cumbersome or just not possible.
Steps to Reproduce
statsd
receiver and thefile
exporter.Expected Result
The metric is expected to be aggregated, I'd expect getting a value of
30
with an attributemykey:myVal
fortest.metric
.Actual Result
I get two values of
test.metric
with the same set attributes (myKey:myVal
) and the same timestamp.Sample output from
/tmp/collector-output.json
:Collector version
v0.80.0
Environment information
No response
OpenTelemetry Collector configuration
Log output
No response
Additional context
Possible cause
This can be reproduced using
netcat
because it will close the UDP connection right after writing the message. Therefore, different origin ports are used for each metric, unless the port is specified. As a result, metrics are being aggregated separately here and different batches are obtained here.If we use the same port to write the metrics:
We get the expected results:
Considered solutions
Always decorate metrics with the producer's address. This way, there would be an attribute identifying the metrics's origin and therefore we wouldn't be reporting several values for the same metric and the same set of attributes for a specific timestamp. This approach would be really simple to implement but it would increase the weight of every metric reported (even if the origin's address is not an interesting attribute for the user's use case). As I see it, it would introduce a breaking change and we should avoid it if possible.
Take a different approach to identify the producer's address from metrics so the metrics can still be decorated using processors relying on the producer's address, such as the
k8sattributesprocessor
. Metrics could be optionally decorated (that could be controlled using a configuration flag) with the producer's address and we could avoid grouping them while performing the aggregation. This approach would require a different configuration in the processor to makestatsdreceiver
andk8sattributesprocessor
work together.If the bug is confirmed, we will be willing to help to solve it.