redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.53k stars 582 forks source link

Excessive DNS queries by metrics reporter when on failed response #5777

Open larsenpanda opened 2 years ago

larsenpanda commented 2 years ago

Version & Environment

Redpanda: v22.1.4 (rev 491e569) Centos: 3.10.0-1160.53.1.el7.x86_64

What went wrong?

Someone installed Redpanda in a secure sandbox environment (does not have dns access to query the internet) and our metrics reporter being on by default ended up sending "800 dns queries per second" to their internal dns server, which triggered an alarm and prompted them to cut off the three Redpanda nodes from being able to issue queries against it.

I'd like to know why it would be retrying so aggressively if it's not getting an ip it can use.

What should have happened instead?

Perhaps we need a backoff or similar if we don't get an IP resolved. I suspect a SERVFAIL message is responded with but we may not want to get specific on that condition.

How to reproduce the issue?

I am not able to get the reproduction steps because it pertains to security architecture which the linux admin is not able to share. We should be able to reproduce by using a BIND dns server and disallowing internet based IP ranges as the response.

Additional information

Logs were not possible to attain.

JIRA Link: CORE-985

jcsp commented 2 years ago

Notwithstanding the overly aggressive retries, for systems not connected to the internet one should set enable_metrics_reporter to false.