knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.55k stars 1.15k forks source link

10 second probe delay on activation #4040

Closed greghaynes closed 5 years ago

greghaynes commented 5 years ago

In what area(s)?

/area networking

What version of Knative?

HEAD

Expected Behavior

I am seeing a fairly consistent 10s delay during pod probing. The probe request leaves the activator but never reaches the queue-proxy and remains open for almost exactly 10 seconds. Right before the request is forwarded to queue-proxy kubelet probes queue-proxy.

Best guess is this is something in k8s, possibly a new regression in 1.14.1, as the request seems stuck behind the private service clusterIP routing.

greghaynes commented 5 years ago

Some relevant logs from queue-proxy with added dumping of requests as they come in.

{"level":"info","ts":"2019-05-08T04:12:19.318Z","caller":"logging/config.go:96","msg":"Successfully created the logger.","knative.dev/jsonconfig":"{\n  \"level\": \"info\",\n  \"development\": false,\n  \"out
putPaths\": [\"stdout\"],\n  \"errorOutputPaths\": [\"stderr\"],\n  \"encoding\": \"json\",\n  \"encoderConfig\": {\n    \"timeKey\": \"ts\",\n    \"levelKey\": \"level\",\n    \"nameKey\": \"logger\",\n    \
"callerKey\": \"caller\",\n    \"messageKey\": \"msg\",\n    \"stacktraceKey\": \"stacktrace\",\n    \"lineEnding\": \"\",\n    \"levelEncoder\": \"\",\n    \"timeEncoder\": \"iso8601\",\n    \"durationEncode
r\": \"\",\n    \"callerEncoder\": \"\"\n  }\n}"}                                                                                                                                                               
{"level":"info","ts":"2019-05-08T04:12:19.318Z","caller":"logging/config.go:97","msg":"Logging level set to info"}                                                                                              
{"level":"warn","ts":"2019-05-08T04:12:19.319Z","caller":"logging/config.go:65","msg":"Fetch GitHub commit ID from kodata failed: \"ref: refs/heads/master\" is not a valid GitHub commit ID"}                  
{"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"CONTAINER_CONCURRENCY=0"}                                                                                
{"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:43","msg":"CONTAINER_CONCURRENCY=0"}                                                                                
{"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"QUEUE_SERVING_PORT=8012"}                                                                                
{"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:43","msg":"QUEUE_SERVING_PORT=8012"}                                                                                {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"REVISION_TIMEOUT_SECONDS=300"}                                                                           
{"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:43","msg":"REVISION_TIMEOUT_SECONDS=300"}                                                                           {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"SERVING_CONFIGURATION=helloworld-go"}                                                                    
{"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"SERVING_NAMESPACE=default"}                                                                              {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"SERVING_POD_IP=192.168.1.11"}                                                                            
{"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"SERVING_POD=helloworld-go-5jw9n-deployment-8795578d8-m6rdp"}                                             {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"SERVING_REVISION=helloworld-go-5jw9n"}                                                                   
{"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"USER_PORT=8080"}                                                                                         {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:43","msg":"USER_PORT=8080"}                                                                                         
{"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"queue/main.go:353","msg":"SERVING_REQUEST_METRICS_BACKEND=","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"helloworld-go-5jw9n-deployment-8795578d8-m6rdp"}                                                                                                                                                                 
{"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"queue/main.go:282","msg":"Queue-proxy will listen on port 8012","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"helloworld-go-5jw9n-deployment-8795578d8-m6rdp"}                                                                                                                                                             
{"level":"info","ts":"2019-05-08T04:12:20.972Z","logger":"queueproxy","caller":"queue/main.go:143","msg":"User-container successfully probed.","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"helloworld-go-5jw9n-deployment-8795578d8-m6rdp"}                                                                                                                                                             
{"level":"info","ts":"2019-05-08T04:12:31.050Z","logger":"queueproxy","caller":"queue/main.go:278","msg":"Got request: &{GET / HTTP/1.1 1 1 map[Accept-Encoding:[gzip] Content-Length:[0] K-Network-Probe:[queue] User-Agent:[Go-http-client/1.1] X-B3-Parentspanid:[68057e4006de65f2] X-B3-Sampled:[1] X-B3-Spanid:[0db758ade05726b0] X-B3-Traceid:[9b7955accd5d160ce730c0ddad6d609d] X-Envoy-Decorator-Operation:[helloworld-g
o-5jw9n-priv.default.svc.cluster.local:80/*] X-Forwarded-Proto:[http] X-Istio-Attributes:[CjYKGGRlc3RpbmF0aW9uLnNlcnZpY2UubmFtZRIaEhhoZWxsb3dvcmxkLWdvLTVqdzluLXByaXYKKgodZGVzdGluYXRpb24uc2VydmljZS5uYW1lc3BhY2USCRIHZGVmYXVsdApHCgpzb3VyY2UudWlkEjkSN2t1YmVybmV0ZXM6Ly9hY3RpdmF0b3ItN2M1ZDRjNGRmNi01bDRqZy5rbmF0aXZlLXNlcnZpbmcKTgoXZGVzdGluYXRpb24uc2VydmljZS51aWQSMxIxaXN0aW86Ly9kZWZhdWx0L3NlcnZpY2VzL2hlbGxvd29ybGQtZ28tNW
p3OW4tcHJpdgpQChhkZXN0aW5hdGlvbi5zZXJ2aWNlLmhvc3QSNBIyaGVsbG93b3JsZC1nby01anc5bi1wcml2LmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWw=] X-Request-Id:[bb609f10-001f-9556-9bdb-4879ac0eb57d]] {} <nil> 0 [] false helloworld-go-5jw9n-priv.default.svc.cluster.local:80 map[] map[] <nil> map[] 192.168.1.252:49386 / <nil> <nil> <nil> 0xc0001caa80}","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"helloworld-go-5jw9n
-deployment-8795578d8-m6rdp"}                                                                                                                                                                                   {"level":"error","ts":"2019-05-08T04:12:31.050Z","logger":"queueproxy","caller":"queue/main.go:155","msg":"Got probe header \"queue\"","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"hellow
orld-go-5jw9n-deployment-8795578d8-m6rdp","stacktrace":"main.handler.func1\n\t/home/greghaynes/go/src/github.com/knative/serving/cmd/queue/main.go:155\nnet/http.HandlerFunc.ServeHTTP\n\t/home/greghaynes/golang/src/net/http/server.go:1995\ngithub.com/knative/serving/pkg/queue.(*timeoutHandler).ServeHTTP.func1\n\t/home/greghaynes/go/src/github.com/knative/serving/pkg/queue/timeout.go:89"}                           
{"level":"info","ts":"2019-05-08T04:12:31.051Z","logger":"queueproxy","caller":"queue/main.go:143","msg":"User-container successfully probed.","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"helloworld-go-5jw9n-deployment-8795578d8-m6rdp"}                                                                                                                                                              
{"level":"info","ts":"2019-05-08T04:12:31.054Z","logger":"queueproxy","caller":"queue/main.go:278","msg":"Got request: &{GET / HTTP/1.1 1 1 map[Accept:[*/*] Accept-Encoding:[gzip] Content-Length:[0] K-Original-Host:[helloworld-go.default.example.com] K-Proxy-Request:[activator] User-Agent:[curl/7.52.1] X-B3-Parentspanid:[52877f137c95a65a] X-B3-Sampled:[1] X-B3-Spanid:[73adcd37b1163308] X-B3-Traceid:[9b7955accd5d1
60ce730c0ddad6d609d] X-Envoy-Decorator-Operation:[helloworld-go-5jw9n-priv.default.svc.cluster.local:80/*] X-Forwarded-For:[192.168.0.1, 127.0.0.1] X-Forwarded-Proto:[http] X-Istio-Attributes:[CjYKGGRlc3RpbmF0aW9uLnNlcnZpY2UubmFtZRIaEhhoZWxsb3dvcmxkLWdvLTVqdzluLXByaXYKKgodZGVzdGluYXRpb24uc2VydmljZS5uYW1lc3BhY2USCRIHZGVmYXVsdApHCgpzb3VyY2UudWlkEjkSN2t1YmVybmV0ZXM6Ly9hY3RpdmF0b3ItN2M1ZDRjNGRmNi01bDRqZy5rbmF0aXZlLXN
lcnZpbmcKTgoXZGVzdGluYXRpb24uc2VydmljZS51aWQSMxIxaXN0aW86Ly9kZWZhdWx0L3NlcnZpY2VzL2hlbGxvd29ybGQtZ28tNWp3OW4tcHJpdgpQChhkZXN0aW5hdGlvbi5zZXJ2aWNlLmhvc3QSNBIyaGVsbG93b3JsZC1nby01anc5bi1wcml2LmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWw=] X-Request-Id:[eca0b5fa-2eb8-99a0-a80c-ba39ab4582ba]] {} <nil> 0 [] false helloworld-go-5jw9n-priv.default.svc.cluster.local:80 map[] map[] <nil> map[] 192.168.1.252:49386 / <nil> <nil> <nil
> 0xc0004167c0}","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"helloworld-go-5jw9n-deployment-8795578d8-m6rdp"}                                                                            {"level":"error","ts":"2019-05-08T04:12:31.054Z","logger":"queueproxy","caller":"queue/main.go:155","msg":"Got probe header \"\"","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"helloworld-
go-5jw9n-deployment-8795578d8-m6rdp","stacktrace":"main.handler.func1\n\t/home/greghaynes/go/src/github.com/knative/serving/cmd/queue/main.go:155\nnet/http.HandlerFunc.ServeHTTP\n\t/home/greghaynes/golang/src/net/http/server.go:1995\ngithub.com/knative/serving/pkg/queue.(*timeoutHandler).ServeHTTP.func1\n\t/home/greghaynes/go/src/github.com/knative/serving/pkg/queue/timeout.go:89"}
vagababov commented 5 years ago

Do you have any logs from the activator for the same period of time?

On Wed, May 8, 2019 at 9:21 AM Gregory Haynes notifications@github.com wrote:

Some relevant logs from queue-proxy

{"level":"info","ts":"2019-05-08T04:12:19.318Z","caller":"logging/config.go:96","msg":"Successfully created the logger.","knative.dev/jsonconfig":"{\n \"level\": \"info\",\n \"development\": false,\n \"out putPaths\": [\"stdout\"],\n \"errorOutputPaths\": [\"stderr\"],\n \"encoding\": \"json\",\n \"encoderConfig\": {\n \"timeKey\": \"ts\",\n \"levelKey\": \"level\",\n \"nameKey\": \"logger\",\n \ "callerKey\": \"caller\",\n \"messageKey\": \"msg\",\n \"stacktraceKey\": \"stacktrace\",\n \"lineEnding\": \"\",\n \"levelEncoder\": \"\",\n \"timeEncoder\": \"iso8601\",\n \"durationEncode r\": \"\",\n \"callerEncoder\": \"\"\n }\n}"} {"level":"info","ts":"2019-05-08T04:12:19.318Z","caller":"logging/config.go:97","msg":"Logging level set to info"} {"level":"warn","ts":"2019-05-08T04:12:19.319Z","caller":"logging/config.go:65","msg":"Fetch GitHub commit ID from kodata failed: \"ref: refs/heads/master\" is not a valid GitHub commit ID"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"CONTAINER_CONCURRENCY=0"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:43","msg":"CONTAINER_CONCURRENCY=0"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"QUEUE_SERVING_PORT=8012"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:43","msg":"QUEUE_SERVING_PORT=8012"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"REVISION_TIMEOUT_SECONDS=300"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:43","msg":"REVISION_TIMEOUT_SECONDS=300"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"SERVING_CONFIGURATION=helloworld-go"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"SERVING_NAMESPACE=default"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"SERVING_POD_IP=192.168.1.11"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"SERVING_POD=helloworld-go-5jw9n-deployment-8795578d8-m6rdp"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"SERVING_REVISION=helloworld-go-5jw9n"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:33","msg":"USER_PORT=8080"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"util/env.go:43","msg":"USER_PORT=8080"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"queue/main.go:353","msg":"SERVING_REQUEST_METRICS_BACKEND=","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"helloworld-go-5jw9n-deployment-8795578d8-m6rdp"} {"level":"info","ts":"2019-05-08T04:12:19.319Z","logger":"queueproxy","caller":"queue/main.go:282","msg":"Queue-proxy will listen on port 8012","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"helloworld-go-5jw9n-deployment-8795578d8-m6rdp"} {"level":"info","ts":"2019-05-08T04:12:20.972Z","logger":"queueproxy","caller":"queue/main.go:143","msg":"User-container successfully probed.","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"helloworld-go-5jw9n-deployment-8795578d8-m6rdp"} {"level":"info","ts":"2019-05-08T04:12:31.050Z","logger":"queueproxy","caller":"queue/main.go:278","msg":"Got request: &{GET / HTTP/1.1 1 1 map[Accept-Encoding:[gzip] Content-Length:[0] K-Network-Probe:[queue] User-Agent:[Go-http-client/1.1] X-B3-Parentspanid:[68057e4006de65f2] X-B3-Sampled:[1] X-B3-Spanid:[0db758ade05726b0] X-B3-Traceid:[9b7955accd5d160ce730c0ddad6d609d] X-Envoy-Decorator-Operation:[helloworld-g o-5jw9n-priv.default.svc.cluster.local:80/] X-Forwarded-Proto:[http] X-Istio-Attributes:[CjYKGGRlc3RpbmF0aW9uLnNlcnZpY2UubmFtZRIaEhhoZWxsb3dvcmxkLWdvLTVqdzluLXByaXYKKgodZGVzdGluYXRpb24uc2VydmljZS5uYW1lc3BhY2USCRIHZGVmYXVsdApHCgpzb3VyY2UudWlkEjkSN2t1YmVybmV0ZXM6Ly9hY3RpdmF0b3ItN2M1ZDRjNGRmNi01bDRqZy5rbmF0aXZlLXNlcnZpbmcKTgoXZGVzdGluYXRpb24uc2VydmljZS51aWQSMxIxaXN0aW86Ly9kZWZhdWx0L3NlcnZpY2VzL2hlbGxvd29ybGQtZ28tNW p3OW4tcHJpdgpQChhkZXN0aW5hdGlvbi5zZXJ2aWNlLmhvc3QSNBIyaGVsbG93b3JsZC1nby01anc5bi1wcml2LmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWw=] X-Request-Id:[bb609f10-001f-9556-9bdb-4879ac0eb57d]] {} 0 [] false helloworld-go-5jw9n-priv.default.svc.cluster.local:80 map[] map[] map[] 192.168.1.252:49386 / 0xc0001caa80}","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"helloworld-go-5jw9n -deployment-8795578d8-m6rdp"} {"level":"error","ts":"2019-05-08T04:12:31.050Z","logger":"queueproxy","caller":"queue/main.go:155","msg":"Got probe header \"queue\"","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"hellow orld-go-5jw9n-deployment-8795578d8-m6rdp","stacktrace":"main.handler.func1\n\t/home/greghaynes/go/src/github.com/knative/serving/cmd/queue/main.go:155\nnet/http.HandlerFunc.ServeHTTP\n\t/home/greghaynes/golang/src/net/http/server.go:1995\ngithub.com/knative/serving/pkg/queue.(timeoutHandler).ServeHTTP.func1\n\t/home/greghaynes/go/src/github.com/knative/serving/pkg/queue/timeout.go:89 http://github.com/knative/serving/cmd/queue/main.go:155%5Cnnet/http.HandlerFunc.ServeHTTP%5Cn%5Ct/home/greghaynes/golang/src/net/http/server.go:1995%5Cngithub.com/knative/serving/pkg/queue.(*timeoutHandler).ServeHTTP.func1%5Cn%5Ct/home/greghaynes/go/src/github.com/knative/serving/pkg/queue/timeout.go:89"} {"level":"info","ts":"2019-05-08T04:12:31.051Z","logger":"queueproxy","caller":"queue/main.go:143","msg":"User-container successfully probed.","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"helloworld-go-5jw9n-deployment-8795578d8-m6rdp"} {"level":"info","ts":"2019-05-08T04:12:31.054Z","logger":"queueproxy","caller":"queue/main.go:278","msg":"Got request: &{GET / HTTP/1.1 1 1 map[Accept:[/] Accept-Encoding:[gzip] Content-Length:[0] K-Original-Host:[helloworld-go.default.example.com] K-Proxy-Request:[activator] User-Agent:[curl/7.52.1] X-B3-Parentspanid:[52877f137c95a65a] X-B3-Sampled:[1] X-B3-Spanid:[73adcd37b1163308] X-B3-Traceid:[9b7955accd5d1 60ce730c0ddad6d609d] X-Envoy-Decorator-Operation:[helloworld-go-5jw9n-priv.default.svc.cluster.local:80/*] X-Forwarded-For:[192.168.0.1, 127.0.0.1] X-Forwarded-Proto:[http] X-Istio-Attributes:[CjYKGGRlc3RpbmF0aW9uLnNlcnZpY2UubmFtZRIaEhhoZWxsb3dvcmxkLWdvLTVqdzluLXByaXYKKgodZGVzdGluYXRpb24uc2VydmljZS5uYW1lc3BhY2USCRIHZGVmYXVsdApHCgpzb3VyY2UudWlkEjkSN2t1YmVybmV0ZXM6Ly9hY3RpdmF0b3ItN2M1ZDRjNGRmNi01bDRqZy5rbmF0aXZlLXN lcnZpbmcKTgoXZGVzdGluYXRpb24uc2VydmljZS51aWQSMxIxaXN0aW86Ly9kZWZhdWx0L3NlcnZpY2VzL2hlbGxvd29ybGQtZ28tNWp3OW4tcHJpdgpQChhkZXN0aW5hdGlvbi5zZXJ2aWNlLmhvc3QSNBIyaGVsbG93b3JsZC1nby01anc5bi1wcml2LmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWw=] X-Request-Id:[eca0b5fa-2eb8-99a0-a80c-ba39ab4582ba]] {} 0 [] false helloworld-go-5jw9n-priv.default.svc.cluster.local:80 map[] map[] map[] 192.168.1.252:49386 / <nil

0xc0004167c0}","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"helloworld-go-5jw9n-deployment-8795578d8-m6rdp"} {"level":"error","ts":"2019-05-08T04:12:31.054Z","logger":"queueproxy","caller":"queue/main.go:155","msg":"Got probe header \"\"","knative.dev/key":"default/helloworld-go-5jw9n","knative.dev/pod":"helloworld- go-5jw9n-deployment-8795578d8-m6rdp","stacktrace":"main.handler.func1\n\t/home/greghaynes/go/src/github.com/knative/serving/cmd/queue/main.go:155\nnet/http.HandlerFunc.ServeHTTP\n\t/home/greghaynes/golang/src/net/http/server.go:1995\ngithub.com/knative/serving/pkg/queue.(timeoutHandler).ServeHTTP.func1\n\t/home/greghaynes/go/src/github.com/knative/serving/pkg/queue/timeout.go:89 <http://github.com/knative/serving/cmd/queue/main.go:155%5Cnnet/http.HandlerFunc.ServeHTTP%5Cn%5Ct/home/greghaynes/golang/src/net/http/server.go:1995%5Cngithub.com/knative/serving/pkg/queue.(timeoutHandler).ServeHTTP.func1%5Cn%5Ct/home/greghaynes/go/src/github.com/knative/serving/pkg/queue/timeout.go:89>"}

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/knative/serving/issues/4040#issuecomment-490553661, or mute the thread https://github.com/notifications/unsubscribe-auth/AAF2WXY5XIWDWTGTAV2DUH3PUL4WZANCNFSM4HLTGXWA .

greghaynes commented 5 years ago

This is from a different test but the only relevant line which is output is:

{"level":"info","ts":"2019-05-08T16:36:27.839Z","logger":"activator","caller":"handler/handler.go:74","msg":"Probing http://helloworld-go-944wt-priv.default.svc.cluster.local:80 took 1 attempts and 10.024953429s time","knative.dev/controller":"activator"}
greghaynes commented 5 years ago

Very interestingly - I changed the queue-proxy probe timeout to 1s instead of 10s and it didnt have any effect...

vagababov commented 5 years ago

Perhaps: https://github.com/knative/serving/blob/784b4fbb0e2960677a3cdde21f4b66583eace844/pkg/network/transports.go#L48?

vagababov commented 5 years ago

we don't have any other transport timeouts which are 10s.

greghaynes commented 5 years ago

Same thing - not surprising as I think im not using TLS here...

greghaynes commented 5 years ago

I investigated further. What happens is if I make a connection to the ClusterIP of my service before any endpoints are available then the connection remains open in SYN_SENT state indefinitely:

Client:

tcp      SYN-SENT     0          1               192.168.55.164:47114           10.97.221.209:80

Compute node

tcp      6 108 SYN_SENT src=192.168.55.164 dst=10.97.221.209 sport=47114 dport=80 [UNREPLIED] src=10.97.221.209 dst=169.62.93.166 sport=80 dport=41995 mark=0 use=1

The end result is both our activator and kube-proxy are watching the endpoint object and therefore can race. If our activator wins and the forwarded request arrives before kube-proxy reconfigures iptables then the request ends up in this state and will wait until some kind of TCP timeout.

My thinking is this behavior for connections that are opened on a ClusterIP with no endpoints available is either a bug or poorly designed - we should be getting an immediate RST if the connections aren't going to be forwarded on when an endpoint becomes available.

Details on my setup:

Kubeadm version 1.14.3-00 Kuberenetes version v1.14.3 Calico v3.7

dcbw commented 5 years ago

If no endpoints are available for a service, kube-proxy installs REJECT rules in iptables that will terminate the request:

            // No endpoints.
            writeLine(proxier.filterRules,
                "-A", string(kubeServicesChain),
                "-m", "comment", "--comment", fmt.Sprintf(`"%s has no endpoints"`, svcNameString),
                "-m", protocol, "-p", protocol,
                "-d", utilproxy.ToCIDR(svcInfo.ClusterIP),
                "--dport", strconv.Itoa(svcInfo.Port),
                "-j", "REJECT",
            )

Obviously this is subject to lag between the apiserver and the proxy, internally in the proxy building the new iptables-restore data, and installing that data in the kernel. But that has always been the case.

In general Kubernetes and its components are "eventually" consistent where "eventually" does not mean immediately, but also does not usually mean "minutes".

vagababov commented 5 years ago

So should dial timeout set to a very short interval fix this?

greghaynes commented 5 years ago

Created a k8s issue to track this: https://github.com/kubernetes/kubernetes/issues/78994 - Thanks for the help in k8s networking slack!

greghaynes commented 5 years ago

@vagababov Yep, ill throw up a patch for us to work around it that way

greghaynes commented 5 years ago

Interestingly, we fixed this in https://github.com/knative/serving/pull/3794 - what I am noticing is this now shows up when tracing is enabled. I am guessing the tracing transport overwrites our dialer here.

greghaynes commented 5 years ago

Did some more debugging and the tracing comment I made above is a red herring. Heres what actually goes on:

In order to hit this Istio mesh needs to be disabled by setting the injection label to disabled on the namespace of applications but activator needs to be still running with istio sidecar. In this case there is a race where we can observe the endpoint become ready and send a request to it from activator before k8s networking has reconfigured. There is then a second, identical, race with envoy/istio noticing the endpoint is available. If istio notices before k8s but k8s networking has not reconfigured yet then we are able to open a connection to envoy sidecar without error (therefore our Dial timeout is not triggered) but the envoy encounters the bug https://github.com/kubernetes/kubernetes/issues/78994. Activator then waits a full 10s for a response from envoy without knowing it has failed to initiate a connection.

tcnghia commented 5 years ago

@greghaynes this 10 seconds may be an Istio setting https://github.com/knative/serving/blob/master/third_party/istio-1.1.7/istio.yaml#L576 https://github.com/knative/serving/blob/master/third_party/istio-1.1.7/istio.yaml#L429

tcnghia commented 5 years ago

We can reduce this setting somewhat. The bug is really https://github.com/kubernetes/kubernetes/issues/78994 so I am not sure if there is anything to be done from Envoy/Istio beside changing the setting.

If we can change this only for activator we could change this to sub-second like what we had in our Dial configurations. But I am not sure that is possible.

tcnghia commented 5 years ago

/close

knative-prow-robot commented 5 years ago

@tcnghia: Closing this issue.

In response to [this](https://github.com/knative/serving/issues/4040#issuecomment-533406093): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.