proxy: high memory usage

siggy commented 6 years ago

Observed conduit-proxy using 1GB memory:

$ docker stats $(docker ps --format={{.Names}}|grep conduit-proxy_prometheus)
CONTAINER                                                                                                CPU %               MEM USAGE / LIMIT       MEM %               NET I/O             BLOCK I/O           PIDS
k8s_conduit-proxy_prometheus-7f8f99c74d-tbd72_conduit-lifecycle_2c0a3b14-733e-11e8-b903-7cd30ab18fe8_0   6.85%               1.092 GiB / 251.8 GiB   0.43%               0 B / 0 B           29.3 MB / 0 B       3

Env

conduit v0.4.4
k8s v1.10.4
- 28 node cluster
- 1 master
- 2 256GB nodes (k8s_conduit-proxy_prometheus runs here)
- 25 8GB nodes
lifecycle example
- scaled to 20 namespaces x 15 replicas (bin/scale 20 15)
- ~2500 pods

metrics output

https://gist.github.com/siggy/8957be93203e9c9889161650b9d355d4

olix0r commented 6 years ago

This is a great find! I think we're close to merging https://github.com/runconduit/conduit/pull/1128 -- I'd be really curious to see the proxy's process_* stats in this scenario (especially over time).

siggy commented 6 years ago

Unable to reproduce with:

conduit v0.4.4
bin scale 10 15
bb-terminus pod deletion with SLEEP_TIME=10
bb-terminus service deletion every 150s

screen shot 2018-06-26 at 9 54 27 am

bin scale 20 15 had previously made the cluster unstable, but going to retry under those original conditions to attempt a reproduction. One change will be to pin Prometheus and Grafana to a higher memory node: https://gist.github.com/siggy/ad6e4715d168e8206683619b12ca2cc7

siggy commented 6 years ago

No noticeable increase in memory after 6 hours:

siggy commented 6 years ago

restarting the test on master (git-15718de2) with more aggressive service discovery changes:

conduit install --conduit-namespace conduit-lifecycle | kubectl apply -f -
bin/deploy 1
bin/scale 1 300

redeployer:

    #!/bin/sh

    # give deployment time to fully roll out
    sleep 60

    while true; do
      PODS=$(kubectl -n $LIFECYCLE_NS get po --field-selector=status.phase=Running --selector=app=bb-terminus -o jsonpath='{.items[*].metadata.name}')

      SPACES=$(echo "${PODS}" | awk -F" " '{print NF-1}')
      POD_COUNT=$(($SPACES+1))
      echo "found ${POD_COUNT} running pods"

      SLEEP_TIME=1

      # more aggressive restart interval, for smaller deployments
      # restart each pod every minute
      # SLEEP_TIME=$(( 60 / $POD_COUNT))
      # if [ $SLEEP_TIME = 0 ]; then
      #   SLEEP_TIME=1
      # fi

      counter=0
      for POD in ${PODS}; do
        kubectl -n $LIFECYCLE_NS delete po $POD
        echo "sleeping for ${SLEEP_TIME} seconds..."
        sleep $SLEEP_TIME

        # bounce service every 30 seconds
        if [ $(expr $counter % 30) = 0 ]; then
          svc=$(kubectl -n $LIFECYCLE_NS get svc/bb-terminus -o json)
          kubectl -n $LIFECYCLE_NS delete svc/bb-terminus
          echo $svc | kubectl -n $LIFECYCLE_NS apply -f -
        fi
        counter=`expr $counter + 1`
      done

      # bounce service
      # svc=$(kubectl -n $LIFECYCLE_NS get svc/bb-terminus -o json)
      # kubectl -n $LIFECYCLE_NS delete svc/bb-terminus
      # echo $svc | kubectl -n $LIFECYCLE_NS apply -f -
    done

siggy commented 6 years ago

With the more aggressive configuration above, two issues to note:

Memory usage

container_memory_working_set_bytes, as reported by Docker, is sometimes, but not always, higher than the process_resident_memory_bytes and process_virtual_memory_bytes metrics provided by conduit-proxy:

Highest container_memory_working_set_bytes conduit-proxy was in a bb-p2p pod, reporting:

96MB container_memory_working_set_bytes
90MB process_virtual_memory_bytes
59MB process_resident_memory_bytes

Highest process_virtual_memory_bytes conduit-proxy was in a bb-p2p pod, reporting:

45MB container_memory_working_set_bytes
101MB process_virtual_memory_bytes
43MB process_resident_memory_bytes

For comparison, the conduit-proxy in the Prometheus pod:

68MB container_memory_working_set_bytes
55MB process_virtual_memory_bytes
39MB process_resident_memory_bytes

screen shot 2018-06-27 at 11 00 41 am

Proxy capacity error

The conduit-proxy running in the Prometheus pod is logging these at a rate of about 50/s:

2018-06-27T18:12:18.005047944Z ERR! proxy={server=out listen=127.0.0.1:4140 remote=10.233.65.37:51788} conduit_proxy router at capacity (100); returning a 503

conduit-proxy also reports these every 10s (with surprising regularity):

2018-06-27T18:17:49.498127182Z ERR! proxy={server=out listen=127.0.0.1:4140 remote=10.233.65.37:42296} conduit_proxy turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500

Notable metrics from the Prometheus conduit-proxy:

process_open_fds 849
tcp_open_connections{direction="outbound",peer="src"} 818

Also note not a single metric with classification="failure" Full metrics dump: https://gist.github.com/siggy/db8719a26c732dfccd911729290a226f

Likely related, the Grafana instance is unable to connect to Prometheus, running curl from the Grafana pod yields:

$ curl -v http://prometheus.conduit-lifecycle.svc.cluster.local:9090
* Rebuilt URL to: http://prometheus.conduit-lifecycle.svc.cluster.local:9090/
*   Trying 10.233.14.166...
* TCP_NODELAY set
* Connected to prometheus.conduit-lifecycle.svc.cluster.local (10.233.14.166) port 9090 (#0)
> GET / HTTP/1.1
> Host: prometheus.conduit-lifecycle.svc.cluster.local:9090
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< content-length: 0
< date: Wed, 27 Jun 2018 18:09:01 GMT
<
* Curl_http_done: called premature == 0
* Connection #0 to host prometheus.conduit-lifecycle.svc.cluster.local left intact

hawkw commented 6 years ago

@siggy the router at capacity message comes from https://github.com/runconduit/conduit/blob/master/proxy/src/lib.rs#L417, it's because the outbound router cache has a capacity configured by the CONDUIT_PROXY_OUTBOUND_ROUTER_CAPACITY environment variable. This sets a limit on the maximum number of (protocol, authority) pairs the proxy's router cache can store at any given time.

I suspect you see this regularly because the Prometheus pod tries to scrape the 300 pods in the cluster every 10 seconds, but the outbound router capacity defaults to 100. We might want to change this default, or set a higher limit for the Prometheus proxy at inject-time.

olix0r commented 6 years ago

Before we investigate changing this setting, it would be helpful to understand how prometheus's scraper works. Does it try to issue as many requests as possible in parallel? Are there ways to constrain parallelism?

siggy commented 6 years ago

Prometheus' scraper does not appear to have any parallelism control, and does not explicitly try to perform scrapes in parallel, though in practice it may perform that way: https://github.com/prometheus/prometheus/blob/057a5ae2b147b6a1dfe0d3a667e13ed6535dbe20/scrape/scrape.go#L329

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

linkerd / linkerd2