client curl returned 503

gyliu513 commented 6 years ago

Hi @rshriram , I run the multicluster again in a new kubernetes cluster and the case failed.

I can deploy all of the component well, and I can see all related resources are working well, both client and server can also be created and running well.

root@gyliu-icp-3:~/istio_federation_demo-ca# SCRIPTDIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
root@gyliu-icp-3:~/istio_federation_demo-ca# CLUSTER1_ID="cluster77"
root@gyliu-icp-3:~/istio_federation_demo-ca# CLUSTER2_ID="cluster121"
root@gyliu-icp-3:~/istio_federation_demo-ca# ROOTCA_ID="cluster29"
root@gyliu-icp-3:~/istio_federation_demo-ca#
root@gyliu-icp-3:~/istio_federation_demo-ca# CLUSTER1_NAME="${CLUSTER1_ID}.k8s.local"
root@gyliu-icp-3:~/istio_federation_demo-ca# CLUSTER2_NAME="${CLUSTER2_ID}.k8s.local"
root@gyliu-icp-3:~/istio_federation_demo-ca# ROOTCA_NAME="${ROOTCA_ID}.k8s.local"
root@gyliu-icp-3:~/istio_federation_demo-ca#
root@gyliu-icp-3:~/istio_federation_demo-ca# kubectl --context ${ROOTCA_NAME} get nodes

NAME            STATUS    ROLES     AGE       VERSION
9.111.255.152   Ready     <none>    2h        v1.10.0+icp-ee
9.111.255.29    Ready     <none>    2h        v1.10.0+icp-ee
root@gyliu-icp-3:~/istio_federation_demo-ca# kubectl --context ${CLUSTER1_NAME} get nodes
NAME            STATUS    ROLES     AGE       VERSION
9.111.255.155   Ready     <none>    3h        v1.10.0+icp-ee
9.111.255.77    Ready     <none>    3h        v1.10.0+icp-ee
root@gyliu-icp-3:~/istio_federation_demo-ca# kubectl --context ${CLUSTER2_NAME} get nodes
NAME            STATUS    ROLES     AGE       VERSION
9.111.255.121   Ready     <none>    2h        v1.10.0+icp-ee
9.111.255.216   Ready     <none>    2h        v1.10.0+icp-ee

ca cluster

root@gyliu-icp-3:~/istio_federation_demo-ca# kubectl get pods -n istio-system
NAME                                        READY     STATUS    RESTARTS   AGE
istio-standalone-citadel-67d5557fd5-zgv9g   1/1       Running   0          1h
root@gyliu-icp-3:~/istio_federation_demo-ca# kubectl get svc -n istio-system
NAME                 TYPE           CLUSTER-IP   EXTERNAL-IP   PORT(S)          AGE
standalone-citadel   LoadBalancer   90.0.0.138   9.111.255.9   8060:32040/TCP   1h

cluster1 which is named as cluster77 in my env.

root@gyliu-icp-3:~/istio_federation_demo-ca# kubectl -n istio-system --context=cluster77.k8s.local get pods -owide
NAME                                       READY     STATUS      RESTARTS   AGE       IP            NODE
istio-citadel-6dccd8f47c-67xvd             1/1       Running     0          1h        20.1.35.200   9.111.255.155
istio-cleanup-old-ca-lkkgm                 0/1       Completed   0          1h        20.1.35.198   9.111.255.155
istio-egressgateway-7785656b5-7hxh4        1/1       Running     0          1h        20.1.35.196   9.111.255.155
istio-ingress-6464f65df9-dtkvl             1/1       Running     0          1h        20.1.35.204   9.111.255.155
istio-ingressgateway-56d99c76f9-8hnml      1/1       Running     0          1h        20.1.35.201   9.111.255.155
istio-mixer-create-cr-dwkdf                0/1       Completed   0          1h        20.1.35.197   9.111.255.155
istio-pilot-66f4dd866c-fx7qx               2/2       Running     0          1h        20.1.35.207   9.111.255.155
istio-policy-76c8896799-96ztb              2/2       Running     0          1h        20.1.35.205   9.111.255.155
istio-sidecar-injector-645c89bc64-ct86n    1/1       Running     0          1h        20.1.35.206   9.111.255.155
istio-statsd-prom-bridge-949999c4c-rnppn   1/1       Running     0          1h        20.1.35.195   9.111.255.155
istio-telemetry-6554768879-twbpg           2/2       Running     0          1h        20.1.35.203   9.111.255.155
prometheus-86cb6dd77c-b7qbw                1/1       Running     0          1h        20.1.35.199   9.111.255.155
root@gyliu-icp-3:~/istio_federation_demo-ca# kubectl -n istio-system --context=cluster77.k8s.local get svc
NAME                       TYPE           CLUSTER-IP   EXTERNAL-IP   PORT(S)                                                               AGE
istio-citadel              ClusterIP      20.0.0.171   <none>        8060/TCP,9093/TCP                                                     1h
istio-egressgateway        ClusterIP      20.0.0.240   <none>        443/TCP                                                               1h
istio-ingress              NodePort       20.0.0.169   <none>        80:32000/TCP,443:31014/TCP                                            1h
istio-ingressgateway       LoadBalancer   20.0.0.244   9.111.255.6   443:32513/TCP                                                         1h
istio-pilot                ClusterIP      20.0.0.87    <none>        15003/TCP,15005/TCP,15007/TCP,15010/TCP,15011/TCP,8080/TCP,9093/TCP   1h
istio-policy               ClusterIP      20.0.0.136   <none>        9091/TCP,15004/TCP,9093/TCP                                           1h
istio-sidecar-injector     ClusterIP      20.0.0.138   <none>        443/TCP                                                               1h
istio-standalone-citadel   ExternalName   <none>       9.111.255.9   <none>                                                                1h
istio-statsd-prom-bridge   ClusterIP      20.0.0.222   <none>        9102/TCP,9125/UDP                                                     1h
istio-telemetry            ClusterIP      20.0.0.22    <none>        9091/TCP,15004/TCP,9093/TCP,42422/TCP                                 1h
prometheus                 ClusterIP      20.0.0.238   <none>        9090/TCP                                                              1h

cluster2 which is cluster121 in my env.

root@gyliu-icp-3:~/istio_federation_demo-ca# kubectl -n istio-system --context=cluster121.k8s.local get pods -owide
NAME                                       READY     STATUS      RESTARTS   AGE       IP            NODE
istio-citadel-6fd7fb4f4c-j42zw             1/1       Running     0          1h        40.1.211.74   9.111.255.216
istio-cleanup-old-ca-br4rz                 0/1       Completed   0          1h        40.1.211.75   9.111.255.216
istio-egressgateway-7785656b5-bprdv        1/1       Running     0          1h        40.1.211.69   9.111.255.216
istio-ingress-6464f65df9-6mwln             1/1       Running     0          1h        40.1.211.68   9.111.255.216
istio-ingressgateway-56d99c76f9-5p5nq      1/1       Running     0          1h        40.1.211.70   9.111.255.216
istio-mixer-create-cr-hhfcg                0/1       Completed   0          1h        40.1.211.76   9.111.255.216
istio-pilot-66f4dd866c-8jxjc               2/2       Running     0          1h        40.1.211.78   9.111.255.216
istio-policy-76c8896799-66q2v              2/2       Running     0          1h        40.1.211.71   9.111.255.216
istio-sidecar-injector-645c89bc64-xrwpw    1/1       Running     0          1h        40.1.211.77   9.111.255.216
istio-statsd-prom-bridge-949999c4c-q5b6x   1/1       Running     0          1h        40.1.211.67   9.111.255.216
istio-telemetry-6554768879-5nm8s           2/2       Running     0          1h        40.1.211.73   9.111.255.216
prometheus-86cb6dd77c-89cwt                1/1       Running     0          1h        40.1.211.72   9.111.255.216
root@gyliu-icp-3:~/istio_federation_demo-ca# kubectl -n istio-system --context=cluster121.k8s.local get svc
NAME                       TYPE           CLUSTER-IP   EXTERNAL-IP   PORT(S)                                                               AGE
istio-citadel              ClusterIP      40.0.0.179   <none>        8060/TCP,9093/TCP                                                     1h
istio-egressgateway        ClusterIP      40.0.0.225   <none>        443/TCP                                                               1h
istio-ingress              NodePort       40.0.0.20    <none>        80:32000/TCP,443:31860/TCP                                            1h
istio-ingressgateway       LoadBalancer   40.0.0.180   9.111.255.3   443:32062/TCP                                                         1h
istio-pilot                ClusterIP      40.0.0.91    <none>        15003/TCP,15005/TCP,15007/TCP,15010/TCP,15011/TCP,8080/TCP,9093/TCP   1h
istio-policy               ClusterIP      40.0.0.154   <none>        9091/TCP,15004/TCP,9093/TCP                                           1h
istio-sidecar-injector     ClusterIP      40.0.0.193   <none>        443/TCP                                                               1h
istio-standalone-citadel   ExternalName   <none>       9.111.255.9   <none>                                                                1h
istio-statsd-prom-bridge   ClusterIP      40.0.0.155   <none>        9102/TCP,9125/UDP                                                     1h
istio-telemetry            ClusterIP      40.0.0.37    <none>        9091/TCP,15004/TCP,9093/TCP,42422/TCP                                 1h
prometheus                 ClusterIP      40.0.0.249   <none>        9090/TCP                                                              1h

From cluster2 where server is running:

/ # nslookup server.cluster2.global
Server:    40.0.0.10
Address 1: 40.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      server.cluster2.global
Address 1: 1.1.1.1 1dot1dot1dot1.cloudflare-dns.com
/ #

From cluster1 where client is running:

root@gyliu-ubuntu-1:~/cases/pod# kubectl exec -it busybox sh
/ # nslookup server.cluster2.global
Server:    20.0.0.10
Address 1: 20.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      server.cluster2.global
Address 1: 1.1.1.1 1dot1dot1dot1.cloudflare-dns.com

We can see that the coreDNS is working well in cluster1 and cluster2.

But when I run following command from client to server, the client istio-proxy report some error and the curl return 503.

curl -s -o /dev/null -I -w "%{http_code}" http://server.cluster2.global/helloworld

Curl server from client.

root@gyliu-icp-3:~/istio_federation_demo-ca# kubectl --context=cluster77.k8s.local get pods
NAME                     READY     STATUS    RESTARTS   AGE
busybox                  1/1       Running   0          50m
client-f6d466d66-xxmcj   2/2       Running   0          12m

root@gyliu-icp-3:~/istio_federation_demo-ca# kubectl --context=cluster77.k8s.local exec -it  client-f6d466d66-xxmcj sh
Defaulting container name to client.
Use 'kubectl describe pod/client-f6d466d66-xxmcj -n default' to see all of the containers in this pod.
# curl -s -o /dev/null -I -w "%{http_code}" http://server.cluster2.global/helloworld
503#

Check client istio-proxy log, it report the following error.

[2018-07-08 15:15:27.127][20][info][main] external/envoy/source/server/drain_manager_impl.cc:63] shutting down parent after drain
[2018-07-08 15:16:14.571][37][info][client] external/envoy/source/common/http/codec_client.cc:117] [C4] protocol error: http/1.1 protocol error: HPE_INVALID_CONSTANT
[2018-07-08T15:16:14.559Z] "HEAD /helloworld HTTP/1.1" 503 - 0 0 12 11 "-" "curl/7.35.0" "5259f54b-c029-975b-99e8-54998c38fabc" "server.cluster2.global" "20.1.35.196:443"
[2018-07-08 15:27:08.553][32][info][client] external/envoy/source/common/http/codec_client.cc:117] [C6] protocol error: http/1.1 protocol error: HPE_INVALID_CONSTANT
[2018-07-08T15:27:08.535Z] "HEAD /helloworld HTTP/1.1" 503 - 0 0 17 17 "-" "curl/7.35.0" "38e51c5b-715f-9c68-9004-38eee0088c0b" "server.cluster2.global" "20.1.35.196:443"

Please note 20.1.35.196 is my egressgateway pod IP in the cluster where client is running.

root@gyliu-ubuntu-1:~/cases/pod# kubectl get pods -n istio-system -owide | grep egressgateway
NAME                                       READY     STATUS      RESTARTS   AGE       IP            NODE
istio-egressgateway-7785656b5-7hxh4        1/1       Running     0          16m       20.1.35.196   9.111.255.155

The egressgateway pod report follows:

[libprotobuf INFO src/istio/mixerclient/check_cache.cc:155] Add a new Referenced for check cache: Absence-keys: Exact-keys: context.protocol, destination.service, destination.uid, source.ip, source.uid,
[2018-07-08T15:28:44.231Z] "HEAD /helloworld HTTP/1.1" 503 UF 0 57 7 - "20.1.35.211" "curl/7.35.0" "7a9bb993-71f4-97f5-bd15-fdf2ed8ddc3a" "server.cluster2.global" "9.111.255.3:443"

We can see that the egress gateway in cluster1 (cluster77 in my env) is trying to access the ingressgateway loadbalancer IP in cluster2 (cluster121 in my env).

But check the log of ingressgateway in cluster2 (cluster121 in my env), there is no incoming request from cluster1 (cluster77 in my env).

[2018-07-08 15:01:10.767][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:390] add/update cluster outbound|80||*.cluster1.global starting warming
[2018-07-08 15:01:10.767][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:397] warming cluster outbound|80||*.cluster1.global complete
[2018-07-08 15:01:10.768][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:390] add/update cluster outbound|443||*.cluster1.global starting warming
[2018-07-08 15:01:10.769][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:397] warming cluster outbound|443||*.cluster1.global complete
[2018-07-08 15:01:21.304][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:390] add/update cluster outbound|80||*.cluster2.global starting warming
[2018-07-08 15:01:21.304][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:397] warming cluster outbound|80||*.cluster2.global complete
[2018-07-08 15:01:21.305][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:390] add/update cluster outbound|443||*.cluster2.global starting warming
[2018-07-08 15:01:21.306][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:397] warming cluster outbound|443||*.cluster2.global complete
[2018-07-08 15:02:36.561][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:442] removing cluster outbound|80||server.default.svc.cluster.local
[2018-07-08 15:03:20.367][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:390] add/update cluster outbound|80||server.default.svc.cluster.local starting warming
[2018-07-08 15:03:20.367][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:397] warming cluster outbound|80||server.default.svc.cluster.local complete
[2018-07-08 15:12:57.716][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:442] removing cluster outbound|443||*.cluster1.global
[2018-07-08 15:12:57.718][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:442] removing cluster outbound|80||*.cluster1.global
[2018-07-08 15:13:02.626][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:390] add/update cluster outbound|443||istio-egressgateway.istio-system.svc.cluster.local starting warming
[2018-07-08 15:13:02.626][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:442] removing cluster outbound|80||server.default.svc.cluster.local
[2018-07-08 15:13:02.626][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:442] removing cluster outbound|443||*.cluster2.global
[2018-07-08 15:13:02.626][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:442] removing cluster outbound|80||*.cluster2.global
[2018-07-08 15:13:02.629][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:397] warming cluster outbound|443||istio-egressgateway.istio-system.svc.cluster.local complete
[2018-07-08 15:13:02.631][24][info][upstream] external/envoy/source/server/lds_api.cc:74] lds: remove listener '0.0.0.0_443'
[2018-07-08 15:14:18.904][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:390] add/update cluster outbound|80||*.cluster1.global starting warming
[2018-07-08 15:14:18.904][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:397] warming cluster outbound|80||*.cluster1.global complete
[2018-07-08 15:14:18.905][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:390] add/update cluster outbound|443||*.cluster1.global starting warming
[2018-07-08 15:14:18.906][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:397] warming cluster outbound|443||*.cluster1.global complete
[2018-07-08 15:14:24.037][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:390] add/update cluster outbound|80||server.default.svc.cluster.local starting warming
[2018-07-08 15:14:24.038][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:390] add/update cluster outbound|443||istio-egressgateway.istio-system.svc.cluster.local starting warming
[2018-07-08 15:14:24.039][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:390] add/update cluster outbound|443||*.cluster1.global starting warming
[2018-07-08 15:14:24.039][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:397] warming cluster outbound|443||*.cluster1.global complete
[2018-07-08 15:14:24.040][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:390] add/update cluster outbound|80||*.cluster2.global starting warming
[2018-07-08 15:14:24.040][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:397] warming cluster outbound|80||*.cluster2.global complete
[2018-07-08 15:14:24.041][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:390] add/update cluster outbound|443||*.cluster2.global starting warming
[2018-07-08 15:14:24.041][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:397] warming cluster outbound|443||*.cluster2.global complete
[2018-07-08 15:14:24.042][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:397] warming cluster outbound|443||istio-egressgateway.istio-system.svc.cluster.local complete
[2018-07-08 15:14:24.042][24][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:397] warming cluster outbound|80||server.default.svc.cluster.local complete
[2018-07-08 15:14:24.045][24][warning][config] external/envoy/source/server/listener_manager_impl.cc:254] adding listener '0.0.0.0:443': filter chain match rules require TLS Inspector listener filter, but it isn't configured, trying to inject it (this might fail if Envoy is compiled without it)
[2018-07-08 15:14:24.045][24][info][upstream] external/envoy/source/server/lds_api.cc:62] lds: add/update listener '0.0.0.0_443'

gyliu513 commented 6 years ago

Output of curl -vv http://server.cluster2.global/helloworld from client pod.

# curl -vv http://server.cluster2.global/helloworld
* Hostname was NOT found in DNS cache
*   Trying 1.1.1.1...
* Connected to server.cluster2.global (1.1.1.1) port 80 (#0)
> GET /helloworld HTTP/1.1
> User-Agent: curl/7.35.0
> Host: server.cluster2.global
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< content-length: 57
< content-type: text/plain
< date: Sun, 08 Jul 2018 15:34:32 GMT
* Server envoy is not blacklisted
< server: envoy
< x-envoy-upstream-service-time: 10
<
* Connection #0 to host server.cluster2.global left intact
upstream connect error or disconnect/reset before headers#

rshriram commented 6 years ago

sorry for the delay. I will update the YAMLs today. There were some API changes

rshriram commented 6 years ago

is this with 0.8 images?

gyliu513 commented 6 years ago

Seems the ingress gateway conflict with my ingress controller, after delete the ingress controller in my cluster, it works fine.

Do you have any comments for what is wrong with my ingress controller?

root@gyliu-dev1:~/go/src/github.com/kubernetes-sigs/federation-v2# kubectl get svc -n kube-system | egrep  "ingress|backend"
default-backend                               ClusterIP   100.0.0.128   <none>        80/TCP               18d
icp-management-ingress                        ClusterIP   100.0.0.87    <none>        8443/TCP             18d

ghost commented 5 years ago

@gyliu513 @rshriram I am trying the above with istio-1.1 branch and seeing the 503.

* Hostname was NOT found in DNS cache
*   Trying 1.1.1.2...
* Connected to server.ns2.svc.cluster.global (1.1.1.2) port 80 (#0)
> GET /helloworld HTTP/1.1
> User-Agent: curl/7.35.0
> Host: server.ns2.svc.cluster.global
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< content-length: 57
< content-type: text/plain
< date: Tue, 15 Jan 2019 00:20:15 GMT
* Server envoy is not blacklisted
< server: envoy
<
* Connection #0 to host server.ns2.svc.cluster.global left intact
upstream connect error or disconnect/reset before headers

I am observing the following: i) istio-proxy log on client pod shows that its connecting to 1.1.1.2:80 (and not the egress pod ip like @gyliu513 showed above) ii) The istio egress gateway doesn't have any logs (which makes sense based on i above)

Any pointers on what could be causing the istio-proxy to use the resolved IP (1.1.1.2 in this case) and not use the egress gateway?

rshriram / istio_federation_demo

client curl returned 503 #7