knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.46k stars 1.14k forks source link

Context Canceled in Long-Running Knative Serving Process #15347

Closed nkreiger closed 4 days ago

nkreiger commented 6 days ago

Ask your question here:

Hello,

Been working an issue for a while, and still can't seem to figure it out. I'm seeing this error:

{"severity":"ERROR","timestamp":"2024-06-24T22:35:24.073475903Z","logger":"queueproxy","caller":"network/error_handler.go:33","message":"error reverse proxying request; sockstat: sockets: used 120\nTCP: inuse 38 orphan 3 tw 12 alloc 2548 mem 360\nUDP: inuse 0 mem 8\nUDPLITE: inuse 0\nRAW: inuse 0\nFRAG: inuse 0 memory 0\n","commit":"41769de","knative.dev/key":"dxcm/plugin-github-00025","knative.dev/pod":"plugin-github-00025-deployment-5d95588889-jj77z","error":"context canceled","stacktrace":"knative.dev/pkg/network.ErrorHandler.func1\n\tknative.dev/pkg@v0.0.0-20240116073220-b488e7be5902/network/error_handler.go:33\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\tnet/http/httputil/reverseproxy.go:475\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\n\tknative.dev/serving/pkg/queue/request_metric.go:199\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\n\tknative.dev/serving/pkg/queue/handler.go:76\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\n\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\n\tknative.dev/serving/pkg/http/handler/timeout.go:118"}

Which indicates context canceled

I've extended the revision timeout:

  enable-service-links: "false"
  max-revision-timeout-seconds: "604800"
  revision-timeout-seconds: "18500"

I've checked that it is being mapped into the knative service, and respective pod via environment variables.

I've also extended the client timeout on the golang sdk-go. But it is still timing out at exactly 5 minutes.

Am I missing something?

nkreiger commented 6 days ago

I double checked its not client side. I sent a 15 minute request locally, this worked fine:

time="2024-06-24T23:54:59Z"


{"severity":"ERROR","timestamp":"2024-06-24T23:59:59.992829527Z","logger":"queueproxy","caller":"network/error_handler.go:33","message":"error reverse proxying request; sockstat: sockets: used 42\nTCP: inuse 18 orphan 3 tw 19 alloc 2575 mem 329\nUDP: inuse 0 mem 8\nUDPLITE: inuse 0\nRAW: inuse 0\nFRAG: inuse 0 memory 0\n","commit":"41769de","knative.dev/key":"dxcm/plugin-github-00029","knative.dev/pod":"plugin-github-00029-deployment-5cccbb8bf6-dcz55","error":"context canceled","stacktrace":"knative.dev/pkg/network.ErrorHandler.func1\n\tknative.dev/pkg@v0.0.0-20240116073220-b488e7be5902/network/error_handler.go:33\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\tnet/http/httputil/reverseproxy.go:475\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\n\tknative.dev/serving/pkg/queue/request_metric.go:199\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\n\tknative.dev/serving/pkg/queue/handler.go:76\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\n\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2136\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\n\tknative.dev/serving/pkg/http/handler/timeout.go:118"}```

5 minutes on the dot.

Does anyone else have an idea where to check? I'm not sure where the timeout is coming from if I've extended the revision, max, and the client side. What else would I be missing?
nkreiger commented 6 days ago

I've added logging to the queue proxy:

time="2024-06-25T02:18:34Z" level=info msg="5h8m20s 5m0s 0s"
time="2024-06-25T02:18:34Z" level=info msg="revTimeout: 5h8m20s revResponseStartTimeout: 5m0s revIdleTimeout: 0s"
time="2024-06-25T02:18:34Z" level=info msg="test: for=10.1.0.136;proto=http, for=10.1.2.15"
time="2024-06-25T02:18:34Z" level=info msg="serving next http"

It looks like the revision timeout is being set properly.

nkreiger commented 4 days ago

Closing, much more detail and converstion in here: https://github.com/knative/serving/issues/15352#issuecomment-2192682382