Open kyungjun-pe opened 2 months ago
This might be the first time we have an issue about adding support to v2. I think it is possible to use v2 but I would clearly separate your issue from moving to v2 since lots of users have been using this registry through the V1 API without transmission issues for years. Also we need to investigate what impact does this have, what are the benefits/drawbacks and the effort.
It might also be the case that using v2 will not solve your issue but Datadog support wants to move users to the v2 API. Based on the data you provided, I'm not sure I see an improvement, quite the opposite:
I don't see how this results in better transmission efficiency and stability. Could you please tell us more about the issue? Are you hitting a rate limit, or does Datadog say the payload is too big or it just silently drops your data? Is there any errors you see on Micrometer side?
There is a batch size parameter in Micrometer you can play with: https://github.com/micrometer-metrics/micrometer/blob/9917c67363f4507291b407edea57aebd2508b950/micrometer-core/src/main/java/io/micrometer/core/instrument/push/PushRegistryConfig.java#L89-L91
If you are hitting a throughput rate limit, you can try to increase it (less but bigger requests), if you hit a payload size limit, try to decrease it (more but smaller requests).
You can also try to modify DatadogMeterRegistry
(use v2) but right now I'm quite sceptic about that it would solve the issue. There is also the StatsD registry which will need an extra component in your infra.
We received the following error while using:
[] [datadog-metrics-publisher] ERROR i.m.d.DatadogMeterRegistry - failed to send metrics to datadog: Unable to read payload i/o timeout
So I talked to a datadog engineer about the related issues and he recommended using v2.
We are currently sending directly to datadog via this library without statsD in spring project.
Also, you can enter a uri in the spring resource option, but it seems to be processed as a url in the actual code, which is a bit confusing.
So since this is a timeout, you can:
HttpSender
)So I talked to a datadog engineer about the related issues and he recommended using v2.
If the suggestions above don't help, I would talk to Datadog support instead of engineers. Again, there is no guarantee that moving to v2 will solve your issue but you can try this out on your own if you really want to by modifying the datadog registry.
Also, you can enter a uri in the spring resource option, but it seems to be processed as a url in the actual code, which is a bit confusing.
I'm not sure I understand this.
This timeout error does not occur once but occurs for several tens of minutes.
I don't think this problem can be solved by increasing the timeout period.
And when this error occurred, datadog said that there were no metrics received.
Also, you can enter a uri in the spring resource option, but it seems to be processed as a url in the actual code, which is a bit confusing.
I think it should be host, not uri. This is because internally, only the host part is received and the path part is hard-coded.
I think it would be a good idea to use v1 by default and allow users to choose the version.
I think it would be a good idea to use v1 by default and allow users to choose the version.
I think we would have to do it that way since metric name size is more restricted in the v2 API, according to what you shared. This means previously valid metric names will become invalid when switching to the v2 API and need to be truncated. That's going to break dashboards and metrics queries for users that have metrics that end up truncated.
I've marked the issue as an enhancement request to support the v2 API. We'll have to check on any other changes, and as discussed above, we would need to retain support for v1 by default. A pull request would be welcome if someone wants to work on this.
This timeout error does not occur once but occurs for several tens of minutes.
This seems like a backend or network error to me. Still not sure moving to v2 will fix it.
I don't think this problem can be solved by increasing the timeout period.
Maybe but I'm curious what makes you think that. Did you try?
And when this error occurred, datadog said that there were no metrics received.
If this is a network issue or an issue with an API Gateway/Load Balancer, this makes sense.
I think it should be host, not uri. This is because internally, only the host part is received and the path part is hard-coded. I think it would be a good idea to use v1 by default and allow users to choose the version.
I see, I think uri is used instead of host because that way you can also specify the protocol (http vs. https):
If you need to publish metrics to an internal proxy en route to datadoghq, you can define the location of the proxy with this.
This timeout error does not occur once but occurs for several tens of minutes.
However, while this problem occurred, there were no problems with our network and Datadog said there were no problems.
Due to various issues when transmitting metrics, I contacted datadog and was told to upgrade the API version.
So, after checking to resolve the issue, I found that datadog-related APIs were hard-coded into the code.
code line
Are you planning to upgrade the API version? The recommended content is to transfer to /api/v2/series rather than /api/v1/series.
Related information will be attached below. Datadog says that transmission efficiency and stability have improved due to the upgrade.