Closed siggy closed 6 years ago
This is a great find! I think we're close to merging https://github.com/runconduit/conduit/pull/1128 -- I'd be really curious to see the proxy's process_* stats in this scenario (especially over time).
Unable to reproduce with:
v0.4.4
bin scale 10 15
bb-terminus
pod deletion with SLEEP_TIME=10
bb-terminus
service deletion every 150sbin scale 20 15
had previously made the cluster unstable, but going to retry under those original conditions to attempt a reproduction. One change will be to pin Prometheus and Grafana to a higher memory node:
https://gist.github.com/siggy/ad6e4715d168e8206683619b12ca2cc7
No noticeable increase in memory after 6 hours:
restarting the test on master (git-15718de2
) with more aggressive service discovery changes:
conduit install --conduit-namespace conduit-lifecycle | kubectl apply -f -
bin/deploy 1
bin/scale 1 300
redeployer:
#!/bin/sh
# give deployment time to fully roll out
sleep 60
while true; do
PODS=$(kubectl -n $LIFECYCLE_NS get po --field-selector=status.phase=Running --selector=app=bb-terminus -o jsonpath='{.items[*].metadata.name}')
SPACES=$(echo "${PODS}" | awk -F" " '{print NF-1}')
POD_COUNT=$(($SPACES+1))
echo "found ${POD_COUNT} running pods"
SLEEP_TIME=1
# more aggressive restart interval, for smaller deployments
# restart each pod every minute
# SLEEP_TIME=$(( 60 / $POD_COUNT))
# if [ $SLEEP_TIME = 0 ]; then
# SLEEP_TIME=1
# fi
counter=0
for POD in ${PODS}; do
kubectl -n $LIFECYCLE_NS delete po $POD
echo "sleeping for ${SLEEP_TIME} seconds..."
sleep $SLEEP_TIME
# bounce service every 30 seconds
if [ $(expr $counter % 30) = 0 ]; then
svc=$(kubectl -n $LIFECYCLE_NS get svc/bb-terminus -o json)
kubectl -n $LIFECYCLE_NS delete svc/bb-terminus
echo $svc | kubectl -n $LIFECYCLE_NS apply -f -
fi
counter=`expr $counter + 1`
done
# bounce service
# svc=$(kubectl -n $LIFECYCLE_NS get svc/bb-terminus -o json)
# kubectl -n $LIFECYCLE_NS delete svc/bb-terminus
# echo $svc | kubectl -n $LIFECYCLE_NS apply -f -
done
With the more aggressive configuration above, two issues to note:
container_memory_working_set_bytes
, as reported by Docker, is sometimes, but not always, higher than the process_resident_memory_bytes
and process_virtual_memory_bytes
metrics provided by conduit-proxy
:
Highest container_memory_working_set_bytes
conduit-proxy
was in a bb-p2p
pod, reporting:
container_memory_working_set_bytes
process_virtual_memory_bytes
process_resident_memory_bytes
Highest process_virtual_memory_bytes
conduit-proxy
was in a bb-p2p
pod, reporting:
container_memory_working_set_bytes
process_virtual_memory_bytes
process_resident_memory_bytes
For comparison, the conduit-proxy
in the Prometheus pod:
container_memory_working_set_bytes
process_virtual_memory_bytes
process_resident_memory_bytes
The conduit-proxy
running in the Prometheus pod is logging these at a rate of about 50/s:
2018-06-27T18:12:18.005047944Z ERR! proxy={server=out listen=127.0.0.1:4140 remote=10.233.65.37:51788} conduit_proxy router at capacity (100); returning a 503
conduit-proxy
also reports these every 10s (with surprising regularity):
2018-06-27T18:17:49.498127182Z ERR! proxy={server=out listen=127.0.0.1:4140 remote=10.233.65.37:42296} conduit_proxy turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500
Notable metrics from the Prometheus conduit-proxy
:
process_open_fds 849
tcp_open_connections{direction="outbound",peer="src"} 818
Also note not a single metric with classification="failure"
Full metrics dump: https://gist.github.com/siggy/db8719a26c732dfccd911729290a226f
Likely related, the Grafana instance is unable to connect to Prometheus, running curl
from the Grafana pod yields:
$ curl -v http://prometheus.conduit-lifecycle.svc.cluster.local:9090
* Rebuilt URL to: http://prometheus.conduit-lifecycle.svc.cluster.local:9090/
* Trying 10.233.14.166...
* TCP_NODELAY set
* Connected to prometheus.conduit-lifecycle.svc.cluster.local (10.233.14.166) port 9090 (#0)
> GET / HTTP/1.1
> Host: prometheus.conduit-lifecycle.svc.cluster.local:9090
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< content-length: 0
< date: Wed, 27 Jun 2018 18:09:01 GMT
<
* Curl_http_done: called premature == 0
* Connection #0 to host prometheus.conduit-lifecycle.svc.cluster.local left intact
@siggy the router at capacity
message comes from https://github.com/runconduit/conduit/blob/master/proxy/src/lib.rs#L417, it's because the outbound router cache has a capacity configured by the CONDUIT_PROXY_OUTBOUND_ROUTER_CAPACITY
environment variable. This sets a limit on the maximum number of (protocol, authority) pairs the proxy's router cache can store at any given time.
I suspect you see this regularly because the Prometheus pod tries to scrape the 300 pods in the cluster every 10 seconds, but the outbound router capacity defaults to 100. We might want to change this default, or set a higher limit for the Prometheus proxy at inject-time.
Before we investigate changing this setting, it would be helpful to understand how prometheus's scraper works. Does it try to issue as many requests as possible in parallel? Are there ways to constrain parallelism?
Prometheus' scraper does not appear to have any parallelism control, and does not explicitly try to perform scrapes in parallel, though in practice it may perform that way: https://github.com/prometheus/prometheus/blob/057a5ae2b147b6a1dfe0d3a667e13ed6535dbe20/scrape/scrape.go#L329
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
Observed conduit-proxy using 1GB memory:
Env
k8s_conduit-proxy_prometheus
runs here)bin/scale 20 15
)metrics output
https://gist.github.com/siggy/8957be93203e9c9889161650b9d355d4