Closed fvarg00 closed 2 months ago
Hi @fvarg00 ,
This appears to be an incomplete bug report. Do you mind filling out all of the fields (in particular how you ran into this situation)? It'll be difficult to reproduce otherwise or even understand if it is a bug or not.
Hello @jszwedko,
Problem Hi, we see below error when there is high CPU load on vector pods. Do we know if this is a known problem?. Any help is appreciated. Thanks!
ERROR source{component_kind="source" component_id=datadog_agents component_type=datadog_agent}:http-request{method=POST path=/api/v2/series}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count= reason="Source send cancelled." internal_log_rate_limit=true
Configuration No response
Version image: docker.io/timberio/vector:0.37.0-distroless-libc
Debug Output N/A
Example Data N/A
Additional Context
We are using DataDog Agent to send logs, metrics, traces to vector.
We use transforms to modify tags for every event that goes through vector, as well as route them to different sinks.
We use ClusterIP for the Kubernetes service and there is no explicit LoadBalancer to distribute traffic among vector pods.
For reference, here is an image regarding CPU/memory load from pods where the errors are coming from.
References N/A
Thanks @fvarg00 . I'm guessing what you are seeing is request timeouts from the client, which will cancel the send downstream. Can you share your configuration? I'm particularly interested if you are using the acknowledgements
feature or not.
Hello @jszwedko, we are not using the acknowledgements feature. We see that acknowledgements field is deprecated. Is that a field something you think would cause this issue or a possible fix?
Which part of configuration would you need to see?
Hello @jszwedko, we are not using the acknowledgements feature. We see that acknowledgements field is deprecated. Is that a field something you think would cause this issue or a possible fix?
Which part of configuration would you need to see?
Gotcha, if you aren't using the acknowledgements feature, than it seems likely that the topology is just applying back-pressure to the Datadog Agent source: that is, the downstream components aren't sending fast enough so data is buffering in the source. The fix would be to identify and resolve the bottleneck (in your case it seems like it might be CPU-bound). To identify the bottleneck you can use the utilization
metric published by internal_metrics
. Identifying the first component in the pipeline where the number is is 1
(or close to it) usually indicates the bottleneck.
Closing this since I think we've narrowed in on the issue, but let me know if you have additional questions!
A note for the community
No response
Problem
Hi, we see below error when there is high CPU load on vector pods. Do we know if this is a known problem?. Any help is appreciated. Thanks!
ERROR source{component_kind="source" component_id=datadog_agents component_type=datadog_agent}:http-request{method=POST path=/api/v2/series}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count= reason="Source send cancelled." internal_log_rate_limit=true
Configuration
No response
Version
0.28.0
Debug Output
No response
Example Data
No response
Additional Context
No response
References
No response