Haproxy 2.2.2 Stale IPs for backends discovered via consul DNS

maximshd commented 4 years ago

Detailed description of the problem

Recently we upgrade our haproxy version to 2.2.2 and observed pretty strange behavior while working with consul DNS SD- some backends have stale or cached IP addresses after backend scale down/scale up events.

On our setup we had 6 servers initially. after that 2 servers went down because of scale down event. As a result, haproxy put them on MAINT state for a while and this is expected behavior. But after a while another scale up triggered spin up of 2 new instances but these new instances were not able to reach UP state because of failed httpchk

Sep  3 07:33:57 haproxy-i-065d2ffe9f716cfe1 haproxy[3961]: Server gateway/gateway3 is DOWN, reason: Layer4 timeout, check duration: 502ms. 3 active and 0 backup servers left. 27 sessions active, 0 requeued, 0 remaining in queue.

After checking haproxy stats, we found that backend server gateway3 from the log above has IP of the old server which was registered under gateway3 name few hours before.

show servers state gateway
1
# be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state srv_uweight srv_iweight srv_time_since_last_change srv_check_status srv_check_result srv_check_health srv_check_state srv_agent_state bk_f_forced_id srv_f_forced_id srv_fqdn srv_port srvrecord
4 gateway 1 gateway1 10.200.9.199 2 64 1 100 21202 15 3 4 6 0 0 0 gateway-i-0e17c6850ca84576e.node.consul 8080 gateway.service.consul
4 gateway 2-gateway2 10.200.2.239 2 64 1 100 20314 15 3 4 6 0 0 0 gateway-i-01eb04834ecc2e931.node.consul 8080 gateway.service.consul
4 gateway 3 gateway3 10.200.11.50 0 64 1 100 528 7 2 0 6 0 0 0 gateway-i-091e3a5113b1b4806.node.consul 8080 gateway.service.consul

Backend IP from haproxy stats - 10.200.11.50. Real server IP according to consul reply - 10.200.1.168

dig @127.0.0.1 -p 8600 gateway.service.consul SRV

; <<>> DiG 9.11.3-1ubuntu1.13-Ubuntu <<>> @127.0.0.1 -p 8600 gateway.service.consul SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42640
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 25

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;gateway.service.consul. IN SRV

;; ANSWER SECTION:
gateway.service.consul. 30 IN SRV 1 1 8080 gateway-i-01eb04834ecc2e931.node.consul.
gateway.service.consul. 30 IN SRV 1 1 8080 gateway-i-091e3a5113b1b4806.node.consul.

;; ADDITIONAL SECTION:
gateway-i-091e3a5113b1b4806.node.consul. 30 IN A 10.200.1.168
gateway-i-091e3a5113b1b4806.node.consul. 30 IN TXT "az=us-east-1d"
gateway-i-091e3a5113b1b4806.node.consul. 30 IN TXT "instance_type=c5.2xlarge"
gateway-i-091e3a5113b1b4806.node.consul. 30 IN TXT "consul-network-segment="

;; Query time: 5 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Thu Sep 03 08:53:04 UTC 2020
;; MSG SIZE  rcvd: 1331

And backends/haproxy cannot recover from such state by itself, only restart helps.

Expected behavior

Use correct up-to-date IP address for communication with backends

Steps to reproduce the behavior

Configure haproxy 2.2.2 with backend SD via consul DNS
Scale down backend group for few instances
Scale up backend group back and verify that some servers are in DOWN state forever.

Do you have any idea what may have caused this?

No

Do you have an idea how to solve the issue?

Checking

What is your configuration?

global
        daemon
        user            haproxy
        group           haproxy
        hard-stop-after 20s
        log             /dev/log len 4096 local2 info
        tune.http.logurilen 3100
        stats socket    /var/lib/haproxy/stats user haproxy group haproxy mode 0700 level admin expose-fd listeners
        chroot          /var/lib/haproxy
        pidfile         /var/run/haproxy.pid
        maxconn         200000

resolvers consul
        nameserver consul 127.0.0.1:8600
        resolve_retries       3
        timeout resolve       1s
        timeout retry         3s
        hold other           30s
        hold refused         30s
        hold nx              30s
        hold timeout         30s
        hold valid           30s

defaults
        log             global
        option          dontlognull
        no option       log-separate-errors
        no option       dontlog-normal

        errorfile 400   /etc/haproxy/errors/400.http
        errorfile 403   /etc/haproxy/errors/403.http
        errorfile 408   /etc/haproxy/errors/408.http
        errorfile 500   /etc/haproxy/errors/500.http
        errorfile 502   /etc/haproxy/errors/502.http
        errorfile 503   /etc/haproxy/errors/503.http
        errorfile 504   /etc/haproxy/errors/504.http

listen stats
        bind            0.0.0.0:81
        mode            http
        option          dontlog-normal
        option          http-server-close

        maxconn         10

        timeout client  10s
        timeout server  10s

        stats           enable
        stats           uri /
        stats           realm  Haproxy\ Statistics

frontend banner-gateway
        bind            0.0.0.0:80
        mode            http
        option          httplog
        option          splice-auto
        maxconn         20000

backend banner-gateway
        mode            http
        option          httpchk GET /healthcheck
        http-check send hdr Host www
        http-check      expect status 200
        option          http-server-close
        option          redispatch
        retries         1

        timeout queue   500ms
        timeout connect 500ms
        timeout server  2s
        timeout check   1s

        balance         leastconn

        server-template   backend 1-200 backend.service.consul check inter 1s fall 3 weight 100 maxconn 400 slowstart 15s resolvers consul resolve-prefer ipv4

Output of `haproxy -vv` and `uname -a`

HA-Proxy version 2.2.2 2020/07/31 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2025.
Known bugs: http://www.haproxy.org/bugs/bugs-2.2.2.html
Running on: Linux 5.3.0-1034-aws #36-Ubuntu SMP Tue Aug 18 08:58:43 UTC 2020 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-stringop-overflow -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_ZLIB=1 USE_SYSTEMD=1

Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=2).
Built with OpenSSL version : OpenSSL 1.1.1  11 Sep 2018
Running on OpenSSL version : OpenSSL 1.1.1  11 Sep 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.31 2018-02-12
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 7.5.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
            fcgi : mode=HTTP       side=BE        mux=FCGI
       <default> : mode=HTTP       side=FE|BE     mux=H1
              h2 : mode=HTTP       side=FE|BE     mux=H2
       <default> : mode=TCP        side=FE|BE     mux=PASS

Available services : none

Available filters :
    [SPOE] spoe
    [COMP] compression
    [TRACE] trace
    [CACHE] cache
    [FCGI] fcgi-app

Additional information (if helpful)

jmagnin commented 4 years ago

Thanks for the bug report. This is related to how haproxy handles SRV records starting with 2.2.

When a name is already known, we only care about weight change which is obviously a problem when an address changes. https://github.com/haproxy/haproxy/blob/master/src/dns.c#L578

wtarreau commented 4 years ago

You mean that lines 603 and 604 make no sense if I understand right ? That sounds plausible indeed. I really have no idea how SRV records work (in details) so I can't judge if it's normal or not to only process the weight but my intuition tells me it sounds fishy, at the very least a comment explaining why would be needed.

@maximshd could you please try to comment out or remove these two lines from dns.c ?:

603               if (srv)
604                        continue;

Be careful, don't do that on your prod if you only have one server, as I'm not certain of any possible side effects. It's just to validate Jerome's idea.

jmagnin commented 4 years ago

I'm not sure what the problem is but when I tried debugging/fixing it earlier I found this part of the code to be likely to be close to the problem.

I've tried your suggestion of removing lines 603 and 604 but this leads to more servers being set, and has no effect on address changes being ignored.

here there should only be 4 servers set as I have 4 entries for my SRV record.

[WARNING] 246/220556 (416028) : in/a1 changed its IP from  to 192.168.135.3 by DNS additional record.
[WARNING] 246/220556 (416028) : in/a2 changed its IP from  to 192.168.135.2 by DNS additional record.
[WARNING] 246/220556 (416028) : in/a3 changed its IP from  to 192.168.135.4 by DNS additional record.
[WARNING] 246/220556 (416028) : in/a4 changed its IP from  to 192.168.135.1 by DNS additional record.
[WARNING] 246/220557 (416028) : in/a5 changed its IP from  to 192.168.135.3 by DNS additional record.
[WARNING] 246/220557 (416028) : in/a6 changed its IP from  to 192.168.135.2 by DNS additional record.

Captain obvious me thinks we should loop through all additional records and see if one matches the hostname currently associated to a given server and update the address when needed.

wtarreau commented 4 years ago

OK thanks. Let's CC @capflam in case he has any obvious idea on this given that he seems to have changed this code recently.

sodabrew commented 4 years ago

This sounds like the same issue as #793 with a fix that came in this mailing list posting here and commit 87138c3524bc4242dc48cfacba82d34504958e78 on the master branch.

I hope that an haproxy 2.2.3 release with this fix will come out soon?

wtarreau commented 4 years ago

It seems a bit more complicated. Without SRV records, there are complex searches of which addresses are in use to make sure the advertised addresses are properly reassigned regardless of their order, and it doesn't seem that this is done as well for SRV records when they're received at once, so that could be the explanation of some of the issues. Christopher has started to look into this but it seems more like a limitation of the current design than just a bug to fix, so it might take a bit more time than expected.

capflam commented 4 years ago

Just a small update. I think I fixed the bug. @jmagnin validated my patch. I must clean it up before pushing it. But it should be ok soon.

capflam commented 4 years ago

I pushed a fix. I hope it does not break anything else. I'm not a DNS expert and this part in HAProxy is pretty fuzzy for me. Because @jmagnin already validated it, I considers the issue as fixed and I will do the backport soon.

wtarreau commented 4 years ago

I think we can issue a 2.2 with it soon. Fortunately that one didn't touch older versions so the risk is limited here.

wtarreau commented 4 years ago

backported where desired, now closing.

haproxy / haproxy