Option to silence DNS lookups when DNS updates/a backend server IP changes

dekimsey commented 2 years ago

Your Feature Request

Allow an option to disable the "X changed its IP from X to Y" warnings.

haproxy-local | [WARNING] 082/114612 (8) : account_service/account_service changed its IP from 123.456.789.123 to 123.456.789.123 by k8s_dns/dns.
haproxy-local | account_service/account_service_1 changed its IP from 123.456.789.123 to 123.456.789.123 by k8s_dns/dns.
haproxy-local | account_service/account_service_1 changed its IP from 123.456.789.123 to 123.456.789.123 by k8s_dns/dns.

I'm not sure if this should be a server, backend, or DNS configuration option but it seems like it's being emitted from multiple locations.

What are you trying to do?

We have a few backends that are defined as DNS entries to external systems (s3 in this example). By design, these systems will rotate DNS their A shuffle their responses. Currently, when set as a backend server haproxy logs an endless stream of "X changed its IP from X to Y".

Refer to discussion on this subject here: https://discourse.haproxy.org/t/stop-logging-x-changed-its-ip-from-y-to-z/6387

Output of `haproxy -vv`

HA-Proxy version 2.2.3-0e58a34 2020/09/08 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2025.
Known bugs: http://www.haproxy.org/bugs/bugs-2.2.3.html
Running on: Linux 5.15.18-200.fc35.aarch64 #1 SMP Sat Jan 29 12:44:33 UTC 2022 aarch64
Build options :
  TARGET  = linux-musl
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-stringop-overflow -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1

Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED -BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=4).
Built with OpenSSL version : OpenSSL 1.1.1g  21 Apr 2020
Running on OpenSSL version : OpenSSL 1.1.1g  21 Apr 2020
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.5
Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.35 2020-05-09
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 9.3.0
Built with the Prometheus exporter as a service

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
            fcgi : mode=HTTP       side=BE        mux=FCGI
       <default> : mode=HTTP       side=FE|BE     mux=H1
              h2 : mode=HTTP       side=FE|BE     mux=H2
       <default> : mode=TCP        side=FE|BE     mux=PASS

Available services :
    prometheus-exporter

Available filters :
    [SPOE] spoe
    [COMP] compression
    [TRACE] trace
    [CACHE] cache
    [FCGI] fcgi-app

TimWolla commented 2 years ago

HA-Proxy version 2.2.3-0e58a34 2020/09/08 - https://haproxy.org/

@dekimsey It won't do anything about your feature request, but I'd like to note that HAProxy 2.2.3 is severely outdated: https://www.haproxy.org/bugs/bugs-2.2.3.html. The current 2.2.x is 2.2.22 as of now. You should keep your HAProxy recent within the chosen branch.

dekimsey commented 2 years ago

@TimWolla Agreed and thank you for pointing that out!

wtarreau commented 2 years ago

I find it concerning that you're seeing a lot of them, because each time an IP address changes, it's not something without possible impact, which is the reason for these logs! What could be the cause of this, having configured less servers than are advertised in DNS maybe ? I would hope that at least after a few DNS responses your backend is completly filled with working addresses that remain stable for the life of these servers. At least the settings in the resolvers section are made for this (I think it's mostly the timeout hold that's meant to be used for this).

dekimsey commented 2 years ago

Thank you @wtarreau. I think the issue is that haproxy wants stable IPs for the backends, and some systems rely on DNS to load-balance and therefore round-robin IPs and that's entirely normal. In particular, our s3 backends where we are using haproxy in front of some of our s3 resources.

The s3 DNS record is a single A record with a 300s TTL. Each lookup will return a different, single A record. I could set a hold value equal to the TTL, but it'll still log the change every 300s though. And at this point I'm now willfully ignoring the record TTL when I do this. Perhaps one could argue the hold behavior could have have a "ttl" special value that holds valid responses until the TTL expires, maybe even pre-querying anew at the 90% mark.

To be honest, I think this is a situation better suited for server-template. But it too would spam endlessly about the IP changing. I'm stuck with no good options.

In an ideal situation, I think I would: 1) use server-template, as its intended to handle variable # of servers. 2) Perhaps server-template uses a server for the duration of the record TTL. 3) resolve-opts nolog-valid

wtarreau commented 2 years ago

But normally (maybe it's the hold parameter but I'm not sure, CCing @bedis for this), even if you have only one server and the S3 DNS announces multiple servers in round robin, as long as your IP address appears often enough it will not change. Alternately you could possibly declare a few servers and use the one that matches the last announce.

tmccombs commented 2 years ago

it's not something without possible impact

What is the impact with regards to haproxy?

I have a similar problem also with s3. As mentioned, DNS for s3 returns a single A record that changes. But it isn't actually doing a round robin between a small number of ip addresses, it seems to be picking a random ip from a large pool of ip addresseses. Or at least something similar. And what's more, the TTL on these records is very small (looking at some sample queries, the TTL is between 1 and 5 seconds in the DNS response).

Looking at the logs for a single backend using s3 over a 5 minute period, after haproxy had already been running for a while, I got this warning 175 times (a little more than once ever 2 seconds), With 152 unique ip addresses.

Granted, the behavior of s3's DNS is pretty unusual. However, I imagine quite a few people use s3 as a backend. I'd really like a way to avoid spamming the logs with these warnings.

I can somewhat mitigate the amount of the logs by setting hold valid and timeout resolve. But I still get these logs, and I'm hesitant to increase the hold amount too much, in case an ip address becomes unavailable.

Are there any other workarounds?

p.s. I'm actually kind of curious about how often I get this warning with the default hold and timeout settings. According to the documentation the default hold valid time is 10 seconds, so I would expect that a valid response would be kept for 10 seconds, yet it is changing every 1 to 2 seconds. Maybe I'm misunderstanding what hold does?

markonen commented 2 years ago

I'm also seeing this with S3 backends specifically, their 5-second TTLs really make it pop. Perhaps the notice level would be appropriate for this and then maybe no configuration options would be needed?

jiang-gao commented 1 year ago

the same logging issue happens to all my s3 backends as well.

markonen commented 1 year ago

We're getting millions of lines of this log spew each day on our edge clusters that have a number of S3 backends. I can think of workarounds and, of course, we could filter these out downstream, but there really seems to be no ill effect to the IP address change so I'd much prefer a way to just silence this. Any thoughts on making this a notice?

ethanmdavidson commented 8 months ago

This is also an issue when putting haproxy in front of a netlify site.

I understand that an IP change is "not without possible impact", but it's the expected behavior of a third-party system that I have no control over. Printing this line repeatedly creates a lot of noise in the logs, and makes it harder for me to see other events that are more likely to have possible impact.

My preferred solution would be an option to disable this message on a resolver, server, or backend (imo on the resolver makes most sense).

Logging to a less-severe level would also be acceptable, though I don't like this solution as much since I have only a few backends that are expected to change their IPs, and I would prefer to continue being notified if one of the others unexpectedly changes its IP.

havedill commented 3 months ago

I'm randomly seeing this on a Consul Resolver. Instead of creating 2 HAProxy backends (which i have it configured to do) it is rotating my IP on a single backend. I'm on HAProxy Enterprise 2.6

I dont recall this happening when i originally configured my backends.

Jun 29 06:51:23 hostname hapee-lb[30434]: excel_simulator/DisplayServer1 changed its IP from 10.13.155.38 to 10.13.155.104 by DNS additional record.
Jun 29 06:51:27 hostname hapee-lb[30434]: excel_simulator/DisplayServer1 changed its IP from 10.13.155.104 to 10.13.155.38 by DNS additional record.
Jun 29 06:51:30 hostname hapee-lb[30434]: excel_simulator/DisplayServer1 changed its IP from 10.13.155.38 to 10.13.155.104 by DNS additional record.
Jun 29 06:51:32 hostname hapee-lb[30434]: excel_simulator/DisplayServer1 changed its IP from 10.13.155.104 to 10.13.155.38 by DNS additional record.
Jun 29 06:51:35 hostname hapee-lb[30434]: excel_simulator/DisplayServer1 changed its IP from 10.13.155.38 to 10.13.155.104 by DNS additional record.
Jun 29 06:51:37 hostname hapee-lb[30434]: excel_simulator/DisplayServer1 changed its IP from 10.13.155.104 to 10.13.155.38 by DNS additional record.

backend excel_simulator
    balance leastconn
    mode http

    server-template DisplayServer 2 _DisplayServer._simulator.service.consul resolvers dev-consul resolve-opts allow-dup-ip resolve-prefer ipv4 check init-addr none

resolvers dev-consul
    nameserver consul1 10.13.157.15:8600
    nameserver consul2 10.13.157.57:8600
    nameserver consul3 10.13.157.36:8600
    accepted_payload_size 8192
    hold valid 5s

There should be 2 backends, not one.

Darlelet commented 3 months ago

@havedill are you sure you are not looking at an older haproxy process which uses wrong configuration? Even if the resolver doesn't provide IP address, server-template should initialize the proper number of servers upon startup (2 according to your configuration)

But since it looks like this is a different problem from the one described in this issue, please open a distinct issue if the problem persists.

havedill commented 3 months ago

Ok thanks, i'll message my enterprise support guys and see what they think

tmccombs commented 3 months ago

@havedill does your DNS server rotate the order of responses, or return the records in a random order?

I think haproxy tries to keep the assignment of ip addresses consistent, but maybe there is a bug there?

havedill commented 3 months ago

I'm fairly certain it's related to this https://github.com/hashicorp/consul/issues/21325

I have my dev cluster on the newest consul. Which is now resolving to the exact same name, versus being unique in the SRV output

Edit:

My issues are resolved. Adding experiments = [ "v1dns" ] as a temporary workaround in my consul.hcl allowed HAProxy to see non-matching DNS entries again. This will be fixed in consul 1.19.1. I see all my backends again

NicoAdrian commented 1 day ago

HAProxy 2.2.31 here. Getting flooded by these logs too. Even setting the log level to err (log 127.0.0.1:514 local6 err) doesn't suppress them !

NicoAdrian commented 13 hours ago

HAProxy 2.2.31 here. Getting flooded by these logs too. Even setting the log level to err (log 127.0.0.1:514 local6 err) doesn't suppress them !

Even with no log, they still appear :(

Darlelet commented 9 hours ago

@NicoAdrian yes indeed they are reported using ha_warning() in addition to log on purpose it seems. Thus it gets printed on stderr no matter log settings.

It was implemented here: 14e4014a485860892933e7c9ce0fb3c53c659e99