Open rpriyanshu9 opened 9 months ago
👋 Thanks for the thorough report and analysis here. After reviewing everything, I agree with the consensus.
Basically the datadog_metrics
sink encoder is encoding without the knowledge of the fact that the v2 parser in the agent source has namespaced the device
to resource.device
. Since we want to handle both the v1 and v2 endpoints , the sink encoder should check for the presence of both. Alternatively the agent source could be consistent in whether or not to namespace it.
Relatedly, this is the type of behavior we will want to test in the end-to-end test cases for the Datadog components that is in progress. I will link this issue there.
In the meantime, I believe this could be worked around by configuring the Agent to send on the v1
series endpoint instead of using the default of v2
. This will mean Vector uses the parsing for v1, which doesn't namespace the device. Another workaround could be to have a transform that intercepts and removes the namespace for that tag.
For the workaround, to configure the Agent to use the v1 endpoint you can set use_v2_api.series: false
in the Agent configuration file (or set DD_USE_V2_API_SERIES=false
).
Another thought- there are in progress changes to migrate the datadog_metrics
sink to send to the v2
series endpoint. In those changes, I'm handling the case for this discrepancy in the source's decoding. Essentially, once that is merged in, this issue should also be resolved.
For the workaround, to configure the Agent to use the v1 endpoint you can set
use_v2_api.series: false
in the Agent configuration file (or setDD_USE_V2_API_SERIES=false
).
Yeah for now we're using this variable to get past the issue. BTW it's the datadog_agent
source, which is at fault, right?
👋 this issue was addressed in https://github.com/vectordotdev/vector/pull/18761 , which is included in the recent v0.34.0
release.
Re-openening since v0.34.1 will contain #19138 , which reverts to the v1 behavior.
Hi @neuronull @jszwedko, are there any updates on this issue?
Hi @neuronull @jszwedko, are there any updates on this issue?
No updates unfortunately; I believe this issue still exists. The fix we'd like to do is to switch the datadog_metrics
sink to using the /v2
metrics API.
A note for the community
Problem
Hey there,
After upgrading Datadog agent from
7.39.2
to7.45.0
, we observed that some metrics which use thedevice
tag stopped coming. We further investigated this and found out that thedevice
tag was renamed toresource.device
after the upgrade. This resulted in many dashboards being empty and monitors going off in Datadog. We had to revert the upgrade for fixing this issue. We looked into the source code of Datadog agent and Vector to find the root cause of this issue.Here's what we think is causing this:
Starting from the
7.43.2
release of Datadog agent, thedevice
tag was sent as a part ofresources
array : https://github.com/DataDog/datadog-agent/pull/16264.The
datadog_agent
source acknowledges the V2 API payload with theresources
field, but does not handle the tags that are sent as a part ofresources
and not thetags
array. ref: https://github.com/vectordotdev/vector/blob/53cad38db12ceb11e0394b4d5906f7de541ec7dc/src/sources/datadog_agent/metrics.rs#L270-L281Because of the above block of code, the
device
tag that comes as an element ofresources
gets remapped toresource.device
by thedatadog_agent
source. Because of this remapping, the metrics sent out by thedatadog_metrics
sink have theresource.device
tag which is incorrect. It should bedevice
only.Seeking assistance in resolving this issue.
Discord thread: https://discord.com/channels/742820443487993987/1155850005391880214
cc @datsabk @jszwedko
Configuration
Version
vector 0.30.0
Debug Output
No response
Example Data
Additional Context
No response
References
No response