haproxytech / dataplaneapi

HAProxy Data Plane API
https://www.haproxy.com/documentation/dataplaneapi/
Apache License 2.0
329 stars 76 forks source link

Data Plane API is killed by SIGTERM in openshift #329

Open git001 opened 7 months ago

git001 commented 7 months ago

Introduction

I try to run haproxy in front of craftcms and want to use the Data Plane API for management.

Data Plane API infos

This is the dataplane api version.

curl -vLo dataplaneapi-haproxy-v2.9.1.tar.gz \
  https://github.com/haproxytech/dataplaneapi/archive/refs/tags/v2.9.1.tar.gz

Due to the fact that the trace output of the dataplaneapi binary was not very helpful have I added this lines to the code which creates the output below.

original code https://github.com/haproxytech/dataplaneapi/blob/14ab8aa5d85a3f697d5a0a44ce10c11335cc92de/client-native/cn.go#L95-L103

my "patch"

# client-native/cn.go:96
            } else {
                // if nbproc is not set, use master socket with 1 process
                out, myerr := exec.Command("ls", "-la", "/data/haproxy/run/master-socket").Output()

                if myerr != nil {
                    fmt.Printf("%s", myerr)
                }

                fmt.Println("Command Successfully Executed")
                output := string(out[:])
                fmt.Println(output)

                ms := runtime_options.MasterSocket(masterSocket, 1)
                runtimeClient, err = runtime_api.New(ctx, mapsDir, ms)
                if err == nil {
                    return runtimeClient
                }
                log.Warningf("Error setting up runtime client with master socket (1): %s : %s", masterSocket, err.Error())
            }

That's the output when I run the HAProxy with the dataplane api

alex@CPC-aleks-RW2GP on 02/04/2024 at 22:49:17_UTC /mnt/c/local_data/git-repos/craftcms_k8s$ oc -n craftcms logs craftcms-hap-b87b89874-ppssv
[NOTICE]   (1) : New program 'api' (8) forked
[NOTICE]   (1) : New worker (9) forked
[NOTICE]   (1) : Loading success.
[WARNING]  (9) : fcgi-servers/craftcms1 changed its IP from (none) to 10.129.2.11 by kube-dns/dns1.
fcgi-servers/craftcms1 changed its IP from (none) to 10.129.2.11 by kube-dns/dns1.
[WARNING]  (9) : Server fcgi-servers/craftcms1 ('craftcms-php.craftcms.svc.cluster.local') is UP/READY (resolves again).
Server fcgi-servers/craftcms1 ('craftcms-php.craftcms.svc.cluster.local') is UP/READY (resolves again).
[WARNING]  (9) : Server fcgi-servers/craftcms1 administratively READY thanks to valid DNS answer.
Server fcgi-servers/craftcms1 administratively READY thanks to valid DNS answer.
configuration file /data/haproxy/etc/dataplaneapi.yaml does not exists, creating one

time="2024-04-02T22:49:16Z" level=info msg="Build from: "
time="2024-04-02T22:49:16Z" level=info msg="HAProxy Data Plane API  .dev.dirty"
time="2024-04-02T22:49:16Z" level=info msg="Build date: 2024-04-02T22:45:03Z"
time="2024-04-02T22:49:16Z" level=info msg="Reload strategy: custom"
Command Successfully Executed
srwxr-xr-x. 1 1000940000 root 0 Apr  2 22:49 /data/haproxy/run/master-socket

time="2024-04-02T22:49:16Z" level=warning msg="Error setting up runtime client with master socket (1): /data/haproxy/run/master-socket;sockpair@7 : dial unix /data/haproxy/run/master-socket;sockpair@7: connect: no such file or directory"

[NOTICE]   (1) : haproxy version is 2.9.6-9eafce5
[NOTICE]   (1) : path to executable is /usr/local/sbin/haproxy
[ALERT]    (1) : Current program 'api' (8) exited with code 1 (Exit)
[ALERT]    (1) : exit-on-failure: killing every processes with SIGTERM
[ALERT]    (1) : Current worker (9) exited with code 143 (Terminated)
[WARNING]  (1) : All workers exited. Exiting... (1)

My observations

This line confuses me

time="2024-04-02T22:49:16Z" level=warning \
msg="Error setting up runtime client with master socket (1): \
  /data/haproxy/run/master-socket;sockpair@7 : \
    dial unix /data/haproxy/run/master-socket;sockpair@7: \
      connect: no such file or directory"

because the ls before the command runtime_api.New(...) shows that the socket is there.

srwxr-xr-x. 1 1000940000 root 0 Apr  2 22:49 /data/haproxy/run/master-socket

and I can execute the help command on the master socket

alex@CPC-aleks-RW2GP on 02/04/2024 at 23:11:02_UTC /mnt/c/local_data/git-repos/craftcms_k8s$ oc -n craftcms rsh --shell /bin/bash craftcms-hap-b87b89874-ddxs8
groups: cannot find name for group ID 1000940000
1000940000@craftcms-hap-b87b89874-ddxs8:/$ echo "help"|socat /data/haproxy/run/master-socket -
The following commands are valid at this level:
  @!<pid>                                 : send a command to the <pid> process
  @<relative pid>                         : send a command to the <relative pid> process
  @master                                 : send a command to the master process
  hard-reload                             : achieve a hard-reload (-st) of haproxy
  operator                                : lower the level of the current CLI session to operator
  reload                                  : achieve a soft-reload (-sf) of haproxy
  show cli level                          : display the level of the current CLI session
  show cli sockets                        : dump list of cli sockets
  show proc                               : show processes status
  show startup-logs                       : report logs emitted during HAProxy startup
  show version                            : show version of the current process
  user                                    : lower the level of the current CLI session to user
  help [<command>]                        : list matching or all commands
  prompt [timed]                          : toggle interactive mode with prompt
  quit                                    : disconnect

My assumption is that dataplaneapi tries to connect to /data/haproxy/run/master-socket;sockpair@7 which of course does not exist.

[!IMPORTANT] When is the ;sockpair@7 added to the master-socket?

haproxy infos

haproxy run

This is how the haproxy is started.

oc -n craftcms exec craftcms-hap-b87b89874-ddxs8 -- ps axf
    PID TTY      STAT   TIME COMMAND
     24 ?        Rs     0:00 ps axf
      1 ?        Ss     0:00 haproxy -f /data/haproxy/etc/haproxy.cfg -db -W -S /data/haproxy/run/master-socket
      8 ?        Sl     0:01 haproxy -f /data/haproxy/etc/haproxy.cfg -db -W -S /data/haproxy/run/master-socket

haproxy config

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    log stdout format raw daemon debug
    pidfile     /data/haproxy/run/haproxy.pid
    master-worker
    stats socket /data/haproxy/run/stats mode 660 level admin expose-fd listeners

resolvers kube-dns
  nameserver dns1 dns-default.openshift-dns.svc.cluster.local:53
  accepted_payload_size 4096
  resolve_retries       3
  timeout resolve       1s
  timeout retry         1s
  hold other           30s
  hold refused         30s
  hold nx              30s
  hold timeout         30s
  hold valid           10s
  hold obsolete        30s

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    balance                 leastconn
    log                     global
    option                  httplog
    option                  dontlognull
    option                  log-health-checks
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s

userlist haproxy-dataplaneapi
    user admin insecure-password mypassword
#
program api
   command /usr/bin/dataplaneapi -f=/data/haproxy/etc/dataplaneapi.yaml --log-to=stdout --log-level=trace --spoe-dir=/data/haproxy/spoe --maps-dir=/data/haproxy/maps --ssl-certs-dir=/data/haproxy/ssl --general-storage-dir=/data/haproxy/general --host 0.0.0.0 --port 5555 --haproxy-bin /usr/sbin/haproxy --config-file /data/haproxy/etc/haproxy.cfg --reload-cmd "kill -SIGUSR2 1" --restart-cmd "kill -SIGUSR2 1" --reload-delay 5 --userlist haproxy-dataplaneapi --socket-path=/data/haproxy/run/data-plane.sock
   no option start-on-reload

#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend craft-cms
  bind *:8080

  tcp-request inspect-delay 5s
  tcp-request content accept if HTTP

  monitor-uri /health
  http-request deny if { path_sub -i %0a %0d }
  http-request deny if { hdr_len(content-length) 0 }
  http-request del-header Proxy
  http-request set-header Host %[req.hdr(Host),lower]

  acl exist-php-ext path_sub -i .php
  http-request set-path /index.php%[path] if !exist-php-ext !{ path_end .php }

  http-response set-header Strict-Transport-Security "max-age=16000000; includeSubDomains; preload;"

  default_backend fcgi-servers

listen stats
  bind *:1936
  monitor-uri /healthz
  http-request use-service prometheus-exporter if { path /metrics }
  stats enable
  stats uri /

backend fcgi-servers

  option httpchk
  http-check connect proto fcgi
  http-check send meth GET uri /fpm-ping

  use-fcgi-app php-fpm

  # https://www.haproxy.com/blog/circuit-breaking-haproxy
  server-template craftcms 5 craftcms-php.craftcms.svc.cluster.local:9000 proto fcgi check resolvers kube-dns init-addr none observe layer7  error-limit 5  on-error mark-down inter 10s  rise 30  slowstart 40s

fcgi-app php-fpm
    log-stderr global
    option keep-conn
    option mpxs-conns
    option max-reqs 10

    docroot /app/web
    index index.php
    path-info ^(/.+\.php)(/.*)?$

Output of haproxy -vv

haproxy -vv
HAProxy version 2.9.6-9eafce5 2024/02/26 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2025.
Known bugs: http://www.haproxy.org/bugs/bugs-2.9.6.html
Running on: Linux 5.14.0-284.52.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jan 30 08:35:38 EST 2024 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_PTHREAD_EMULATION=1 USE_LINUX_TPROXY=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_TFO=1 USE_QUIC=1 USE_PROMEX=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_QUIC_OPENSSL_COMPAT=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX +PTHREAD_EMULATION +QUIC +QUIC_OPENSSL_COMPAT +RT +SHM_OPEN +SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 -SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=8).
Built with OpenSSL version : OpenSSL 3.0.2 15 Mar 2022
Running on OpenSSL version : OpenSSL 3.0.2 15 Mar 2022
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
OpenSSL providers loaded : default
Built with Lua version : Lua 5.4.4
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.39 2021-10-29
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 11.4.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
       quic : mode=HTTP  side=FE     mux=QUIC  flags=HTX|NO_UPG|FRAMED
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
        [BWLIM] bwlim-in
        [BWLIM] bwlim-out
        [CACHE] cache
        [COMP] compression
        [FCGI] fcgi-app
        [SPOE] spoe
        [TRACE] trace

Dockerfile

That's the Dockerfile of the image.

FROM haproxytech/haproxy-ubuntu:2.9

COPY container-files/ /

RUN set -x \
  && cp /usr/bin/dataplaneapi /usr/bin/dataplaneapi.orig \
  && cp /data/haproxy/bin/dataplaneapi /usr/bin/dataplaneapi \
  && mkdir -p /data/haproxy/etc \
    /data/haproxy/run \
    /data/haproxy/maps \
    /data/haproxy/ssl \
    /data/haproxy/general \
    /data/haproxy/spoe \
  && chown -R 1001:0 /data \
  && chmod -R g=u /data \
  && /usr/bin/dataplaneapi -v

USER 1001
mjuraga commented 7 months ago

Hi, can you paste the output of: /data/haproxy/etc/dataplaneapi.yaml?

git001 commented 7 months ago

that's it

$ cat  /data/haproxy/etc/dataplaneapi.yaml
config_version: 2
name: craftcms-hap-debug
mode: single
status: ""
dataplaneapi:
  advertised:
    api_address: ""
    api_port: 0
haproxy:
  reload:
    reload_strategy: custom
git001 commented 7 months ago

Maybe there is another issue and the message above hides the original problem. As OpenShift restricts the Pod run environment could the socket call fail.

https://github.com/haproxytech/client-native/blob/e914b0d0f77265cc83cb61eea069dea75c38a706/runtime/runtime_single_client.go#L209-L223

Which, I think, is called from here.

https://github.com/haproxytech/client-native/blob/e914b0d0f77265cc83cb61eea069dea75c38a706/runtime/runtime_single_client.go#L62-L80

mjuraga commented 7 months ago

Hi @git001, thanks for your input, but the bug is in the dataplaneapi code, it picks up the master socket location from a env variable set by the HAProxy: https://github.com/haproxytech/dataplaneapi/blob/master/configure_data_plane.go#L121 and it doesn't properly sanitize the socket location (doesn't remove sockpair@ suffix added by the latest versions of HAProxy). We can fix this, or you could give it a shot if that is interesting for you.

git001 commented 7 months ago

@mjuraga thanks for the tip. I will try to fix it with a PR

git001 commented 7 months ago

I have now fixed the sockpair@ bug with this code.

...
    // Override options with env variables
    if os.Getenv("HAPROXY_MWORKER") == "1" {
        mWorker = true
        masterRuntime := os.Getenv("HAPROXY_MASTER_CLI")
        if misc.IsUnixSocketAddr(masterRuntime) {

            fmt.Printf("before Replace masterRuntime :%v:\n", masterRuntime)

            if strings.HasPrefix(masterRuntime, "unix@") {
                haproxyOptions.MasterRuntime = strings.Replace(masterRuntime, "unix@", "", 1)
                if strings.Contains(haproxyOptions.MasterRuntime, "sockpair@") {
                    semikolon := strings.Index(haproxyOptions.MasterRuntime, ";")
                    haproxyOptions.MasterRuntime = haproxyOptions.MasterRuntime[:semikolon]
                }
            }

            fmt.Printf("after Replace masterRuntime :%v:\n", haproxyOptions.MasterRuntime)
        }
    }
...

From the output below is shown that the socketpair is gone.

before Replace masterRuntime :unix@/data/haproxy/run/master-socket;sockpair@7:
after Replace masterRuntime :/data/haproxy/run/master-socket:

The problem with the SIGTERM is still there.

 haproxy -f /data/haproxy/etc/haproxy.cfg -db -W -S /data/haproxy/run/master-socket
[NOTICE]   (9) : New program 'api' (11) forked
[NOTICE]   (9) : New worker (12) forked
[NOTICE]   (9) : Loading success.
[WARNING]  (12) : fcgi-servers/craftcms1 changed its IP from (none) to 10.129.2.11 by kube-dns/dns1.
fcgi-servers/craftcms1 changed its IP from (none) to 10.129.2.11 by kube-dns/dns1.
[WARNING]  (12) : Server fcgi-servers/craftcms1 ('craftcms-php.craftcms.svc.cluster.local') is UP/READY (resolves again).
Server fcgi-servers/craftcms1 ('craftcms-php.craftcms.svc.cluster.local') is UP/READY (resolves again).
[WARNING]  (12) : Server fcgi-servers/craftcms1 administratively READY thanks to valid DNS answer.
Server fcgi-servers/craftcms1 administratively READY thanks to valid DNS answer.
configuration file /data/haproxy/etc/dataplaneapi.yaml does not exists, creating one
time="2024-04-03T11:26:23Z" level=info msg="HAProxy Data Plane API  .dev.dirty"
time="2024-04-03T11:26:23Z" level=info msg="Reload strategy: custom"
time="2024-04-03T11:26:23Z" level=info msg="Build from: "
time="2024-04-03T11:26:23Z" level=info msg="Build date: 2024-04-03T11:23:50Z"

before Replace masterRuntime :unix@/data/haproxy/run/master-socket;sockpair@7:
after Replace masterRuntime :/data/haproxy/run/master-socket:

Command Successfully Executed
srwxr-xr-x. 1 1000950000 root 0 Apr  3 11:26 /data/haproxy/run/master-socket

ms :{/data/haproxy/run/master-socket 1}: masterSocket :/data/haproxy/run/master-socket: mapsDir :{/data/haproxy/maps}:
[NOTICE]   (9) : haproxy version is 2.9.6-9eafce5
[NOTICE]   (9) : path to executable is /usr/local/sbin/haproxy
[ALERT]    (9) : Current program 'api' (11) exited with code 1 (Exit)
[ALERT]    (9) : exit-on-failure: killing every processes with SIGTERM
[ALERT]    (9) : Current worker (12) exited with code 143 (Terminated)
[WARNING]  (9) : All workers exited. Exiting... (1)

As mentioned in https://github.com/haproxytech/dataplaneapi/issues/329#issuecomment-2034014921 could it be that the socket call in client-native be another issue?

mjuraga commented 7 months ago

For test could you try running dpapi standalone, without running from the program section.

git001 commented 7 months ago

Any chance to merge the PR https://github.com/haproxytech/dataplaneapi/pull/330 ?