istio / old_issues_repo

Deprecated issue-tracking repo, please post new issues or feature requests to istio/istio instead.
37 stars 9 forks source link

External gRPC requests sometimes yields 404 from destination proxy #351

Closed rfevang closed 6 years ago

rfevang commented 6 years ago

Is this a BUG or FEATURE REQUEST?:

BUG

Did you review https://istio.io/help/ and existing issues to identify if this is already solved or being worked on?: YES

What Version of Istio and Kubernetes are you using, where did you get Istio from, Installation details

istioctl version: 0.5.1, 0.7.1 and 0.8-20180515-17-26
kubectl version: 1.9.6-gke.1

Is Istio Auth enabled or not ?

Not enabled (used istio.yaml).

What happened:

External gRPC requests sometimes yield 404 from the destination envoy.

What you expected to happen:

I expect the request to reach the destination container instead of being stopped by the proxy.

How to reproduce it:

This bug doesn't show consistently, in our setup I would say it manifests on somewhere between 30-99% of pod starts (some days it rarely happens, other days it's impossible to get rid of). It looks like it will also only manifest on some specific conditions that I'm not quite sure what are. Having readiness probes seems to increase the frequency of the issue.

The setup where this happens has a pod with two containers for receiving gRPC requests. One is a gRPC server, accepting all requests. The other is a Cloud Endpoints container that verifies an API key before passing the request to the other container. Only requests to the Cloud Endpoints container fail to reach their destination, if I route requests directly to the gRPC server container (on port 21212) I've not seen this bug.

Both working and non-working pods get their requests routed correctly from ingress:

Working:

[2018-05-23T12:06:47.774Z] "POST /exabel.ts.TimeSeriesService/GetFeed HTTP/2" 200 - 380 61 4 3 "10.132.0.28" "grpc-java-netty/1.11.0" "8cf79460-aa7a-9edc-8218-f52743228b4d" "35.187.111.66:80" "10.112.7.36:31312"

Not working:

[2018-05-23T12:06:47.780Z] "POST /exabel.ts.TimeSeriesService/GetFeed HTTP/2" 404 - 360 0 1 0 "10.132.0.28" "grpc-java-netty/1.11.0" "2c785230-14ad-9f8d-bcd6-f737a3df6a3b" "35.187.111.66:80" "10.112.6.32:31312"

In both cases, I can see the request in the target pod proxy. When working, the proxy correctly routes to the correct localhost port. When not working it doesn't, and just gives a 404. The request never makes it to the intended endpoints container.

Working:

[2018-05-23T12:06:47.774Z] "POST /exabel.ts.TimeSeriesService/GetFeed HTTP/2" 200 - 380 61 3 2 "10.132.0.28" "grpc-java-netty/1.11.0" "8cf79460-aa7a-9edc-8218-f52743228b4d" "35.187.111.66:80" "127.0.0.1:31312"

Not working:

[2018-05-23T12:06:47.780Z] "POST /exabel.ts.TimeSeriesService/GetFeed HTTP/2" 404 NR 0 0 0 - "10.132.0.28" "grpc-java-netty/1.11.0" "2c785230-14ad-9f8d-bcd6-f737a3df6a3b" "35.187.111.66:80" "-"

Note the last part of the two lines above, one routes locally, the other just gives up. All examples above are from the same cluster at the same time, there are two destination deployments pod, one of which is displaying the bug, the other one not.

HTTP requests work even when routed through an endpoints container, though in this case the request isn't logged by the destination envoy at all (I guess it bypasses it?).

I've had some success "force rebooting" only the istio proxy container. I've done this with the following command: kubectl -n $NAMESPACE exec $POD_NAME -c istio-proxy /sbin/killall5. This usually (but not always) fixes the problem, but it has to be done again every time the pod is redeployed.

This problem was present in 0.5.1 and 0.7.1, and is still present in 0.8 (as of May 15 daily build). It also still manifests if using Gateway + VirtualService for routing the ingress traffic.

rfevang commented 6 years ago

Haven't seen this issue since updating to Istio 0.8.0 proper, so likely something changed between the May 15th version and the final release that solved whatever the problem was. Closing.