telepresenceio / telepresence

Local development against a remote Kubernetes or OpenShift cluster
https://www.telepresence.io
Other
6.53k stars 513 forks source link

GET /telepresence-http2-check does not detect http2 service #2842

Closed onmomo closed 8 months ago

onmomo commented 1 year ago

Describe the bug The port 3000 is actually an akka-grpc service that is configured to use http2 but it seems that traffic agent is unable to detect it. As soon as the traffic agent sidecar is running, grpc calls to the service running in the cluster are failing.

There is no active intercept

ulf-dep                        : ready to intercept (traffic-agent already installed)

requests to our akka-grpc service in the cluster fail due to

io.grpc.StatusRuntimeException: UNAVAILABLE: upstream connect error or disconnect/reset before headers. reset reason: connection termination
    at io.grpc.Status.asRuntimeException(Status.java:533)
    at akka.grpc.internal.UnaryCallAdapter.onClose(UnaryCallAdapter.scala:40)
    at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)

traffic agent logs

Started with an intercept of 8080:http and then removed the intercept leaving the traffic agent container running

kubectl logs -f ulf-dep-854d8cc776-qsmqv traffic-agent 
2022-10-28 13:33:04.3563 info    Enhanced Traffic Agent v1.13.1
2022-10-28 13:33:04.3567 info    Adding exported mounts for container ulf
2022-10-28 13:33:04.3569 info    new agent secrets mount path: /var/run/secrets/kubernetes.io
2022-10-28 13:33:04.3569 info    LoadConfig: processing environment...
2022-10-28 13:33:04.3569 info    LoadConfig: processing annotations...
2022-10-28 13:33:04.3569 info    LoadConfig: done
2022-10-28 13:33:04.3600 info    LoadConfig: asking traffic-manager for license information...
2022-10-28 13:33:04.3654 error   error(s) getting license: license not found
2022-10-28 13:33:04.3654 info    LoadConfig: asking traffic-manager for Ambassador Cloud configuration...
2022-10-28 13:33:04.3737 info    trying to connect to ambassador cloud at app.getambassador.io:443
2022-10-28 13:33:04.4845 info    connected to ambassador cloud at app.getambassador.io:443
2022-10-28 13:33:04.4853 info    envoy-server : Management server listening on 18000

2022-10-28 13:33:04.4855 info    envoy : started command ["envoy" "-l" "warn" "-c" "bootstrap-ads.pb" "--base-id" "1"] : dexec.pid="17"
2022-10-28 13:33:04.4856 info    envoy :  : dexec.pid="17" dexec.stream="stdin" dexec.err="EOF"
2022-10-28 13:33:04.5169 info    sidecar : supportsHTTP2: probing for HTTP/2 support... : base-url="http://127.0.0.1:3000"
2022-10-28 13:33:06.0192 info    sidecar : supportsHTTP2: result =  false : base-url="http://127.0.0.1:3000"
2022-10-28 13:33:06.0193 info    sidecar : supportsHTTP2: probing for HTTP/2 support... : base-url="http://127.0.0.1:8080"
2022-10-28 13:33:07.5212 info    sidecar : supportsHTTP2: result =  false : base-url="http://127.0.0.1:8080"
2022-10-28 13:33:07.5213 info    sidecar : supportsHTTP2: probing for HTTP/2 support... : base-url="http://127.0.0.1:5005"
2022-10-28 13:33:09.0236 info    sidecar : supportsHTTP2: result =  false : base-url="http://127.0.0.1:5005"
2022-10-28 13:33:09.0282 info    sidecar : Connected to Manager 2.7.2
2022-10-28 13:33:09.0303 debug   Logging at this level "debug"
2022-10-28 13:33:09.0305 debug   sidecar/handleIntercept : HandleIntercepts []
2022-10-28 13:33:09.0305 debug   sidecar/handleIntercept : HandleIntercepts(0)
2022-10-28 13:33:09.0313 debug   sidecar/handleIntercept : HandleIntercepts(0)
2022-10-28 13:33:09.0313 debug   sidecar/handleIntercept : HandleIntercepts(0)
2022-10-28 13:33:09.0314 debug   envoy-server : respond open watch 1 type.googleapis.com/envoy.config.cluster.v3.Cluster[] with new version "v1"
2022-10-28 13:33:09.0314 debug   envoy-server : respond type.googleapis.com/envoy.config.cluster.v3.Cluster[] version "" with version "v1"
2022-10-28 13:33:09.0335 debug   envoy-server : nodeID "tel-id" requested type.googleapis.com/envoy.config.listener.v3.Listener[] and known map[]. Diff []
2022-10-28 13:33:09.0336 debug   envoy-server : respond type.googleapis.com/envoy.config.listener.v3.Listener[] version "" with version "v1"
2022-10-28 13:33:09.0339 debug   envoy-server : nodeID "tel-id" requested type.googleapis.com/envoy.config.cluster.v3.Cluster[] and known map[]. Diff []
2022-10-28 13:33:09.0339 debug   envoy-server : open watch 2 for type.googleapis.com/envoy.config.cluster.v3.Cluster[] from nodeID "tel-id", version "v1"
2022-10-28 13:33:09.0693 debug   envoy-server : nodeID "tel-id" requested type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] and known map[]. Diff [application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000]
2022-10-28 13:33:09.0694 debug   envoy-server : respond type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] version "" with version "v1"
2022-10-28 13:33:09.0694 debug   envoy-server : nodeID "tel-id" requested type.googleapis.com/envoy.config.listener.v3.Listener[] and known map[]. Diff []
2022-10-28 13:33:09.0694 debug   envoy-server : open watch 3 for type.googleapis.com/envoy.config.listener.v3.Listener[] from nodeID "tel-id", version "v1"
2022-10-28 13:33:09.0730 debug   envoy-server : nodeID "tel-id" requested type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] and known map[application_route_TCP_3000:{} application_route_TCP_5005:{} application_route_TCP_8080:{}]. Diff []
2022-10-28 13:33:09.0730 debug   envoy-server : open watch 4 for type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] from nodeID "tel-id", version "v1"
2022-10-28 13:33:09.1149 debug   sidecar/handleIntercept : HandleIntercepts [3f89919e-e89b-4c12-b820-5e319d5b12e7:ulf-dep-cmos-8080:http WAITING]
2022-10-28 13:33:09.1149 debug   sidecar/handleIntercept : HandleIntercepts(0)
2022-10-28 13:33:09.1149 debug   sidecar/handleIntercept : intercept id 3f89919e-e89b-4c12-b820-5e319d5b12e7:ulf-dep-cmos-8080:http svc="ulf-svc", svcPortId="http" matches config svc="ulf-svc", svcPort=8080, protocol=TCP
2022-10-28 13:33:09.1150 debug   sidecar/handleIntercept : HandleIntercepts(1)
2022-10-28 13:33:09.1150 debug   sidecar/handleIntercept : disposition is WAITING
2022-10-28 13:33:09.1150 debug   sidecar/handleIntercept : Forwarding from :0
2022-10-28 13:33:09.1151 debug   sidecar/handleIntercept : Forward target changed from 127.0.0.1:0 to intercept 'ulf-dep-cmos-8080:http' ($host:8080) : lis=":0"
2022-10-28 13:33:09.1153 debug   sidecar/handleIntercept : HandleIntercepts(0)
2022-10-28 13:33:09.1154 debug   envoy-server : respond open watch 2 type.googleapis.com/envoy.config.cluster.v3.Cluster[] with new version "v2"
2022-10-28 13:33:09.1154 debug   envoy-server : respond type.googleapis.com/envoy.config.cluster.v3.Cluster[] version "v1" with version "v2"
2022-10-28 13:33:09.1154 debug   envoy-server : respond open watch 3 type.googleapis.com/envoy.config.listener.v3.Listener[] with new version "v2"
2022-10-28 13:33:09.1154 debug   envoy-server : respond type.googleapis.com/envoy.config.listener.v3.Listener[] version "v1" with version "v2"
2022-10-28 13:33:09.1154 debug   envoy-server : respond open watch 4 type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] with new version "v2"
2022-10-28 13:33:09.1154 debug   envoy-server : respond type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] version "v1" with version "v2"
2022-10-28 13:33:09.1166 debug   envoy-server : nodeID "tel-id" requested type.googleapis.com/envoy.config.cluster.v3.Cluster[] and known map[]. Diff []
2022-10-28 13:33:09.1166 debug   envoy-server : open watch 5 for type.googleapis.com/envoy.config.cluster.v3.Cluster[] from nodeID "tel-id", version "v2"
2022-10-28 13:33:09.1167 debug   envoy-server : nodeID "tel-id" requested type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] and known map[application_route_TCP_3000:{} application_route_TCP_5005:{} application_route_TCP_8080:{}]. Diff []
2022-10-28 13:33:09.1167 debug   envoy-server : open watch 6 for type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] from nodeID "tel-id", version "v2"
2022-10-28 13:33:09.1173 debug   envoy-server : nodeID "tel-id" requested type.googleapis.com/envoy.config.listener.v3.Listener[] and known map[]. Diff []
2022-10-28 13:33:09.1173 debug   envoy-server : open watch 7 for type.googleapis.com/envoy.config.listener.v3.Listener[] from nodeID "tel-id", version "v2"
2022-10-28 13:33:09.1200 debug   sidecar/handleIntercept : HandleIntercepts [3f89919e-e89b-4c12-b820-5e319d5b12e7:ulf-dep-cmos-8080:http ACTIVE]
2022-10-28 13:33:09.1200 debug   sidecar/handleIntercept : HandleIntercepts(0)
2022-10-28 13:33:09.1201 debug   sidecar/handleIntercept : intercept id 3f89919e-e89b-4c12-b820-5e319d5b12e7:ulf-dep-cmos-8080:http svc="ulf-svc", svcPortId="http" matches config svc="ulf-svc", svcPort=8080, protocol=TCP
2022-10-28 13:33:09.1201 debug   sidecar/handleIntercept : HandleIntercepts(1)
2022-10-28 13:33:09.1201 debug   sidecar/handleIntercept : disposition is ACTIVE
2022-10-28 13:33:09.1202 debug   sidecar/handleIntercept : HandleIntercepts(0)
2022-10-28 13:33:09.1204 debug   envoy-server : respond open watch 6 type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] with new version "v3"
2022-10-28 13:33:09.1204 debug   envoy-server : respond type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] version "v2" with version "v3"
2022-10-28 13:33:09.1205 debug   envoy-server : respond open watch 7 type.googleapis.com/envoy.config.listener.v3.Listener[] with new version "v3"
2022-10-28 13:33:09.1205 debug   envoy-server : respond type.googleapis.com/envoy.config.listener.v3.Listener[] version "v2" with version "v3"
2022-10-28 13:33:09.1205 debug   envoy-server : respond open watch 5 type.googleapis.com/envoy.config.cluster.v3.Cluster[] with new version "v3"
2022-10-28 13:33:09.1205 debug   envoy-server : respond type.googleapis.com/envoy.config.cluster.v3.Cluster[] version "v2" with version "v3"
2022-10-28 13:33:09.1209 debug   envoy-server : nodeID "tel-id" requested type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] and known map[application_route_TCP_3000:{} application_route_TCP_5005:{} application_route_TCP_8080:{}]. Diff []
2022-10-28 13:33:09.1209 debug   envoy-server : open watch 8 for type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] from nodeID "tel-id", version "v3"
2022-10-28 13:33:09.1209 debug   envoy-server : nodeID "tel-id" requested type.googleapis.com/envoy.config.listener.v3.Listener[] and known map[]. Diff []
2022-10-28 13:33:09.1209 debug   envoy-server : open watch 9 for type.googleapis.com/envoy.config.listener.v3.Listener[] from nodeID "tel-id", version "v3"
2022-10-28 13:33:09.1210 debug   envoy-server : nodeID "tel-id" requested type.googleapis.com/envoy.config.cluster.v3.Cluster[] and known map[]. Diff []
2022-10-28 13:33:09.1211 debug   envoy-server : open watch 10 for type.googleapis.com/envoy.config.cluster.v3.Cluster[] from nodeID "tel-id", version "v3"
2022-10-28 13:33:44.7157 debug   sidecar/handleIntercept : Accept got connection from 127.0.0.1:54946 : lis=":0"
2022-10-28 13:33:44.7901 debug   sidecar/handleIntercept : Done serving connection from 127.0.0.1:54946 : lis=":0"
2022-10-28 13:34:04.7158 debug   sidecar/handleIntercept : Accept got connection from 127.0.0.1:55280 : lis=":0"
2022-10-28 13:34:04.8525 debug   sidecar/handleIntercept : Done serving connection from 127.0.0.1:55280 : lis=":0"
2022-10-28 13:34:24.7158 debug   sidecar/handleIntercept : Accept got connection from 127.0.0.1:55600 : lis=":0"
2022-10-28 13:34:24.8555 debug   sidecar/handleIntercept : Done serving connection from 127.0.0.1:55600 : lis=":0"
2022-10-28 13:34:44.7157 debug   sidecar/handleIntercept : Accept got connection from 127.0.0.1:55958 : lis=":0"
2022-10-28 13:34:44.7872 debug   sidecar/handleIntercept : Done serving connection from 127.0.0.1:55958 : lis=":0"
2022-10-28 13:34:59.2712 debug   sidecar/handleIntercept : HandleIntercepts []
2022-10-28 13:34:59.2712 debug   sidecar/handleIntercept : HandleIntercepts(0)
2022-10-28 13:34:59.2713 debug   sidecar/handleIntercept : HandleIntercepts(0)
2022-10-28 13:34:59.2713 debug   sidecar/handleIntercept : Closing forwarder for 3f89919e-e89b-4c12-b820-5e319d5b12e7:ulf-dep-cmos-8080:http
2022-10-28 13:34:59.2713 debug   sidecar/handleIntercept : HandleIntercepts(0)
2022-10-28 13:34:59.2714 debug   envoy-server : respond open watch 8 type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] with new version "v4"
2022-10-28 13:34:59.2714 debug   envoy-server : respond type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] version "v3" with version "v4"
2022-10-28 13:34:59.2714 debug   envoy-server : respond open watch 9 type.googleapis.com/envoy.config.listener.v3.Listener[] with new version "v4"
2022-10-28 13:34:59.2714 debug   envoy-server : respond type.googleapis.com/envoy.config.listener.v3.Listener[] version "v3" with version "v4"
2022-10-28 13:34:59.2714 debug   envoy-server : respond open watch 10 type.googleapis.com/envoy.config.cluster.v3.Cluster[] with new version "v4"
2022-10-28 13:34:59.2714 debug   envoy-server : respond type.googleapis.com/envoy.config.cluster.v3.Cluster[] version "v3" with version "v4"
2022-10-28 13:34:59.2737 debug   envoy-server : nodeID "tel-id" requested type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] and known map[application_route_TCP_3000:{} application_route_TCP_5005:{} application_route_TCP_8080:{}]. Diff []
2022-10-28 13:34:59.2737 debug   envoy-server : open watch 11 for type.googleapis.com/envoy.config.route.v3.RouteConfiguration[application_route_TCP_5005 application_route_TCP_8080 application_route_TCP_3000] from nodeID "tel-id", version "v4"
2022-10-28 13:34:59.2737 debug   envoy-server : nodeID "tel-id" requested type.googleapis.com/envoy.config.cluster.v3.Cluster[] and known map[]. Diff []
2022-10-28 13:34:59.2737 debug   envoy-server : open watch 12 for type.googleapis.com/envoy.config.cluster.v3.Cluster[] from nodeID "tel-id", version "v4"
2022-10-28 13:34:59.2737 debug   envoy-server : nodeID "tel-id" requested type.googleapis.com/envoy.config.listener.v3.Listener[] and known map[]. Diff []
2022-10-28 13:34:59.2737 debug   envoy-server : open watch 13 for type.googleapis.com/envoy.config.listener.v3.Listener[] from nodeID "tel-id", version "v4"

note, traffic agent does not detect http2 service while doing GET /telepresence-http2-check

2022-10-28 13:33:04.5169 info    sidecar : supportsHTTP2: probing for HTTP/2 support... : base-url="http://127.0.0.1:3000"
2022-10-28 13:33:06.0192 info    sidecar : supportsHTTP2: result =  false : base-url="http://127.0.0.1:3000"

This seems misleading since the service does not really listen to port 3000 while the traffic agent gets added since the main pod gets restarted in the process? Will traffic agent repeat this check from time to time?

Expected behavior If traffic agent is installed for the workload, grpc request are served via http2 to cluster service correctly

Versions (please complete the following information):

telepresence version
W1028 15:43:45.730723   96005 native_arm64.go:52] Could not read /proc/cpuinfo: open /proc/cpuinfo: no such file or directory
W1028 15:43:45.730759   96005 native_arm64.go:177] Could not read /proc/self/auxv: open /proc/self/auxv: no such file or directory
An update of telepresence from version 2.7.2 to 2.8.3 is available. Please visit https://www.getambassador.io/docs/telepresence/latest/install/upgrade/ for more info.
Client: v2.7.2 (api v3)
Root Daemon: v2.7.2 (api v3)
User Daemon: v2.7.2 (api v3)
dpiddock commented 1 year ago

I'm trialing telepresence and hit the same issue with a demo gRPC service.

I found a comment buried in a thread on the community Slack. You can set the appProtocol to grpc or http2 on the service's port to hint to Telepresence that the port is HTTP/2 based. It works around the autodetection failure for me.

Example:

apiVersion: v1
kind: Service
metadata:
  name: api-svc
spec:
  ports:
  - name: api
    port: 8080
    targetPort: 8080
    appProtocol: grpc # or http2
  selector:
    app: api-svc
CleverUnderDog commented 1 year ago

Would be very nice if this could be added to documentation somehow. It took me multiple days to find this issue on github.

thallgren commented 1 year ago

@CleverUnderDog it is present in the documentation, under App Protocol Selection. If you have suggestions on how to make this easier to find, please let us know and we'll consider it.

CleverUnderDog commented 1 year ago

@CleverUnderDog it is present in the documentation, under App Protocol Selection. If you have suggestions on how to make this easier to find, please let us know and we'll consider it.

So when I read about this whole topic and tried to find information on how to use intercepts with grpc/http2 I generally landed here a couple of times: Intercepts I would personally find it helpful to have some kind of reference to additional information/configuration parameters when I first read about a topic.

thallgren commented 1 year ago

@JWKartheiser ^^ is perhaps something to consider for the docs improvements?

nicks commented 1 year ago

We hit this bug too. Part of the reason it's so insidious is that it doesn't show up until you create an intercept, then leave the intercept:

telepresence intercept test --mechanism=tcp --mount=false --workload buildkitd --namespace cloud-build-proxy --port 1235
telepresence leave test

# after this, the grpc deployment stops working

our mental model of K8s v1.Service is that it's usually a TCP load balancer and doesn't know much about L7. Maybe the fix is that when you telepresence leave the intercept, you should get a warning like:

$ telepresence leave test
WARNING: Switching the traffic agent from TCP to HTTP. To get rid of this warning, set the Service's `appProtocol` to match whether your pod uses http or http2
thallgren commented 8 months ago

Closing this, because I believe the original problem is fixed in traffic-agent version 1.14.4.