vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.67k stars 1.56k forks source link

Vector Lookup address to DNS even if TTL is higher #21450

Open manavadariakevin opened 1 week ago

manavadariakevin commented 1 week ago

A note for the community

Problem

We are using Vector version vector-0.40.0-1.x86_64 in our linux setup where we have below configuration to send logs to vector aggregators and the endpoint is on Envoy Proxy.

sinks: vector: type: vector

healthcheck: False

 address: "https://vector-nonprod.abc.com"
 compression: True
 inputs:
   - parsing
   - nginx
 batch:
   max_bytes: 10000
   max_events: 10000
 buffer:
   type: "disk"
   max_size: 268435488
 request:
   rate_limit_num: 30
   retry_attempts: 100
   timeout_secs: 5
   retry_max_duration_secs: 5
   retry_initial_backoff_secs: 1
   retry_jitter_mode: Full

it keeps connecting to DNS for lookup for vector-nonprod.abc.com all the time and it is making too much query to DNS while it should use the DNS caching itself or use server resolv configuration to get the data instead of going directly to DNS.

Here are some connections towards our DNS server and this is just for nonprod , but for prod we have something like 500 connections towards DNS and 300 something queries per minute towards DNS. this is affecting our DNS badly with too many requests. If there is any solution to make this work please guide.

netstat -n | grep 254 udp 0 0 10.10.10.17:28174 10.10.10.254:53 ESTABLISHED udp 0 0 10.10.10.17:36843 10.10.10.254:53 ESTABLISHED udp 0 0 10.10.10.17:47618 10.10.10.254:53 ESTABLISHED udp 0 0 10.10.10.17:59961 10.10.10.254:53 ESTABLISHED

Configuration

sinks:
  vector:
     type: vector
       #healthcheck: False
     address: "https://vector-nonprod.abc.com:443"
     compression: True
     inputs:
       - parsing
       - nginx
     batch:
       max_bytes: 10000
       max_events: 10000
     buffer:
       type: "disk"
       max_size: 268435488
     request:
       rate_limit_num: 30
       retry_attempts: 100
       timeout_secs: 5
       retry_max_duration_secs: 5
       retry_initial_backoff_secs: 1
       retry_jitter_mode: Full

Version

vector 0.40.0 (x86_64-unknown-linux-gnu 1167aa9 2024-07-29 15:08:44.028365803)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

manavadariakevin commented 1 week ago

Also have used below combination in address "https://vector-nonprod.abc.com:443"

still it is the same.

jszwedko commented 1 week ago

I think we discussed this in Discord a bit. I mentioned there that Vector does a DNS lookup every time in initiates a connection. However, even given that, it seems like you are seeing many more lookups than might be expected (it seems unlikely, but maybe possible?, that Vector is initiating 500 connections per second).

Regardless, it does seem prudent for Vector to do DNS caching so I think adding that would be a reasonable way to address this issue.

killkill commented 6 days ago

Yes , I meet the same problem my config:

_[sinks.out] type = "loki" inputs = [ "remove_kafka_fields" ] endpoint = "http://distributor-loki.my.com/" out_of_order_action = "accept" remove_timestamp = true tenantid = "myapp"

use tcpdump to watch: tcpdump -vvn port 53

so many dns resolution ;