knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.53k stars 1.15k forks source link

cluster-local-gateway pod is not becoming ready after istio-lean installation #6254

Closed ahmetb closed 4 years ago

ahmetb commented 4 years ago

/area networking /kind bug

I installed istio-lean without sidecar injection documented here from HEAD in v1.4.2.

All Istio and Knative serving components are working fine.

Then I proceeded to install cluster-local-gateway (https://knative.dev/docs/ install/installing-istio/#updating-your-install-to-use-cluster-local-gateway).

It resulted in this file:

istio-local-gateway.yaml.txt

Now the cluster-local-gateway-777c6c9d54-zrs84 pod isn't becoming ready. According to describe pod:

Events:
  Type     Reason     Age                      From                                                 Message
  ----     ------     ----                     ----                                                 -------
  Normal   Scheduled  7m46s                    default-scheduler                                    Successfully assigned istio-system/cluster-local-gateway-777c6c9d54-zrs84 to gke-gke-vanilla-default-pool-183266dd-d450
  Normal   Pulled     7m45s                    kubelet, gke-gke-vanilla-default-pool-183266dd-d450  Container image "gcr.io/istio-testing/proxyv2:latest" already present on machine
  Normal   Created    7m45s                    kubelet, gke-gke-vanilla-default-pool-183266dd-d450  Created container istio-proxy
  Normal   Started    7m45s                    kubelet, gke-gke-vanilla-default-pool-183266dd-d450  Started container istio-proxy
  Warning  Unhealthy  2m45s (x150 over 7m43s)  kubelet, gke-gke-vanilla-default-pool-183266dd-d450  Readiness probe failed: Get http://10.36.2.13:15020/healthz/ready: dial tcp 10.36.2.13:15020: connect: connection refused

Logs are flooded with waiting for file logs:

2019-12-19T04:11:06.737232Z info    FLAG: --binaryPath="/usr/local/bin/envoy"
2019-12-19T04:11:06.737276Z info    FLAG: --concurrency="0"
2019-12-19T04:11:06.737282Z info    FLAG: --configPath="/etc/istio/proxy"
2019-12-19T04:11:06.737289Z info    FLAG: --connectTimeout="10s"
2019-12-19T04:11:06.737293Z info    FLAG: --controlPlaneAuthPolicy="NONE"
2019-12-19T04:11:06.737299Z info    FLAG: --controlPlaneBootstrap="true"
2019-12-19T04:11:06.737302Z info    FLAG: --customConfigFile=""
2019-12-19T04:11:06.737306Z info    FLAG: --datadogAgentAddress=""
2019-12-19T04:11:06.737310Z info    FLAG: --disableInternalTelemetry="false"
2019-12-19T04:11:06.737322Z info    FLAG: --discoveryAddress="istio-pilot.istio-system:15010"
2019-12-19T04:11:06.737326Z info    FLAG: --dnsRefreshRate="300s"
2019-12-19T04:11:06.737331Z info    FLAG: --domain="istio-system.svc.cluster.local"
2019-12-19T04:11:06.737336Z info    FLAG: --drainDuration="45s"
2019-12-19T04:11:06.737340Z info    FLAG: --envoyAccessLogService=""
2019-12-19T04:11:06.737343Z info    FLAG: --envoyMetricsService=""
2019-12-19T04:11:06.737348Z info    FLAG: --help="false"
2019-12-19T04:11:06.737352Z info    FLAG: --id=""
2019-12-19T04:11:06.737356Z info    FLAG: --ip=""
2019-12-19T04:11:06.737361Z info    FLAG: --lightstepAccessToken=""
2019-12-19T04:11:06.737365Z info    FLAG: --lightstepAddress=""
2019-12-19T04:11:06.737369Z info    FLAG: --lightstepCacertPath=""
2019-12-19T04:11:06.737373Z info    FLAG: --lightstepSecure="false"
2019-12-19T04:11:06.737377Z info    FLAG: --log_as_json="false"
2019-12-19T04:11:06.737387Z info    FLAG: --log_caller=""
2019-12-19T04:11:06.737395Z info    FLAG: --log_output_level="default:info"
2019-12-19T04:11:06.737399Z info    FLAG: --log_rotate=""
2019-12-19T04:11:06.737403Z info    FLAG: --log_rotate_max_age="30"
2019-12-19T04:11:06.737407Z info    FLAG: --log_rotate_max_backups="1000"
2019-12-19T04:11:06.737412Z info    FLAG: --log_rotate_max_size="104857600"
2019-12-19T04:11:06.737416Z info    FLAG: --log_stacktrace_level="default:none"
2019-12-19T04:11:06.737468Z info    FLAG: --log_target="[stdout]"
2019-12-19T04:11:06.737483Z info    FLAG: --mixerIdentity=""
2019-12-19T04:11:06.737487Z info    FLAG: --outlierLogPath=""
2019-12-19T04:11:06.737491Z info    FLAG: --parentShutdownDuration="1m0s"
2019-12-19T04:11:06.737495Z info    FLAG: --pilotIdentity=""
2019-12-19T04:11:06.737501Z info    FLAG: --proxyAdminPort="15000"
2019-12-19T04:11:06.737506Z info    FLAG: --proxyComponentLogLevel="misc:error"
2019-12-19T04:11:06.737510Z info    FLAG: --proxyLogLevel="warning"
2019-12-19T04:11:06.737515Z info    FLAG: --serviceCluster="cluster-local-gateway"
2019-12-19T04:11:06.737519Z info    FLAG: --serviceregistry="Kubernetes"
2019-12-19T04:11:06.737524Z info    FLAG: --statsdUdpAddress=""
2019-12-19T04:11:06.737529Z info    FLAG: --statusPort="15020"
2019-12-19T04:11:06.737533Z info    FLAG: --templateFile=""
2019-12-19T04:11:06.737537Z info    FLAG: --trust-domain=""
2019-12-19T04:11:06.737542Z info    FLAG: --zipkinAddress="zipkin.istio-system:9411"
2019-12-19T04:11:06.737579Z info    Version 1.5-alpha.3125f9c9495de045e8447977e6b3eabaae7f0683-3125f9c9495de045e8447977e6b3eabaae7f0683-Clean
2019-12-19T04:11:06.737719Z info    Obtained private IP [10.36.2.13]
2019-12-19T04:11:06.737769Z info    Proxy role: &model.Proxy{ClusterID:"", Type:"router", IPAddresses:[]string{"10.36.2.13", "10.36.2.13"}, ID:"cluster-local-gateway-777c6c9d54-zrs84.istio-system", Locality:(*envoy_api_v2_core.Locality)(nil), DNSDomain:"istio-system.svc.cluster.local", ConfigNamespace:"", Metadata:(*model.NodeMetadata)(nil), SidecarScope:(*model.SidecarScope)(nil), MergedGateway:(*model.MergedGateway)(nil), ServiceInstances:[]*model.ServiceInstance(nil), WorkloadLabels:labels.Collection(nil), IstioVersion:(*model.IstioVersion)(nil)}
2019-12-19T04:11:06.737782Z info    PilotSAN []string(nil)
2019-12-19T04:11:06.737787Z info    MixerSAN []string(nil)
2019-12-19T04:11:06.738405Z info    Effective config: binaryPath: /usr/local/bin/envoy
configPath: /etc/istio/proxy
connectTimeout: 10s
discoveryAddress: istio-pilot.istio-system:15010
drainDuration: 45s
envoyAccessLogService: {}
envoyMetricsService: {}
parentShutdownDuration: 60s
proxyAdminPort: 15000
serviceCluster: cluster-local-gateway
statNameLength: 189
tracing:
  zipkin:
    address: zipkin.istio-system:9411

2019-12-19T04:11:06.738492Z warn    Missing JWT token, can't use in process SDS ./var/run/secrets/tokens/istio-tokenstat ./var/run/secrets/tokens/istio-token: no such file or directory
2019-12-19T04:11:06.738511Z info    Monitored certs: []string{"/etc/certs/cert-chain.pem", "/etc/certs/key.pem", "/etc/certs/root-cert.pem"}
2019-12-19T04:11:06.738521Z info    waiting 2m0s for /etc/certs/cert-chain.pem
2019-12-19T04:11:07.740525Z info    waiting for file
2019-12-19T04:11:07.840794Z info    waiting for file
2019-12-19T04:11:07.940996Z info    waiting for file
2019-12-19T04:11:08.041214Z info    waiting for file
2019-12-19T04:11:08.141478Z info    waiting for file
2019-12-19T04:11:08.241740Z info    waiting for file
2019-12-19T04:11:08.342028Z info    waiting for file
2019-12-19T04:11:08.442253Z info    waiting for file
2019-12-19T04:11:08.542441Z info    waiting for file
2019-12-19T04:11:08.642606Z info    waiting for file
2019-12-19T04:11:08.742848Z info    waiting for file
[...]
goes on forever

I'm not sure what this:

Missing JWT token, can't use in process SDS ./var/run/secrets/tokens/istio-tokenstat ./var/run/secrets/tokens/istio-token: no such file or directory

but I am suspecting the istio-lean installation documented in the link above perhaps isn't compatible with v0.11 or cluster-local-gateway?

duglin commented 4 years ago

The other day I was playing with 0.11 + Istio lean and while I could get my internet facing KnServices to work just fine, I couldn't get intra-cluster communications to work (or cluster-local services). I assumed it was a user-error and I just didn't configure things correctly. Might be related?

ahmetb commented 4 years ago

If I follow these istio lean instructions instead it works: https://knative.dev/docs/install/installing-istio/#installing-istio-with-sidecar-injection and gateway comes up.

However “without sidecar injection” template I think has something odd preventing it from working correctly.

vagababov commented 4 years ago

/assign @tcnghia

tcnghia commented 4 years ago

@ahmetb what is the GKE/Istio add-on version?

tcnghia commented 4 years ago

@ahmetb This may be due to the Istio control plane version mismatch with the helm chart you are using.

ahmetb commented 4 years ago

@tcnghia no addon, everything is self-install. Knative v0.11.0, Istio 1.4.2, installed via helm template rendered from link in first comment.

ahmetb commented 4 years ago

@tcnghia the same version worked when I used the "sidecar injection enabled" mode of the chart template rendering command (though that cmd is also broken as it doesn't actually enable injection https://github.com/knative/docs/issues/2073).

nak3 commented 4 years ago

I have verified the issue and found the root cause (if my produced issue is same with yours.)

TOBE:

   - name: ISTIO_AUTO_MTLS_ENABLED
     value: "false"

or you can set false via helm as:

helm template --namespace=istio-system \
  --set global.mtls.auto=false \

I still need to investigate if helm's bug or we need to update docs to set global.mtls.auto=false explicitly.

nak3 commented 4 years ago

I still need to investigate if helm's bug or we need to update docs to set global.mtls.auto=false explicitly.

It is not helm's bug. Default value is used from install/kubernetes/helm/istio/values.yaml and the global.mtls.auto is enabled by default by this commit https://github.com/istio/istio/pull/18312/files#diff-7dad29cba9d2ca3e8570c8f65f4b7e86R371.

tcnghia commented 4 years ago

@ahmetb can you please try what @nak3 suggested in https://github.com/knative/serving/issues/6254#issuecomment-567785377 ?

ahmetb commented 4 years ago

I don’t think mtls is necessarily the culprit here? https://knative.dev/docs/install/installing-istio/#installing-istio-with-sidecar-injection Applying this yaml doesn’t cause this problem, yet it doesn’t explicitly disables mtls?

nak3 commented 4 years ago

https://knative.dev/docs/install/installing-istio/#installing-istio-with-sidecar-injection Applying this yaml doesn’t cause this problem, yet it doesn’t explicitly disables mtls?

Good point. The reason why the yaml w/ side-car injection does not cause the problem is that it deploys citadel and creates certs under /etc/certs/*. You can check it in cluster-local-gateway pod:

$ kubectl exec -it cluster-local-gateway-xxx -n istio-system -- ls /etc/certs/

Also, if you want to produce the same issue w/ injection yaml, you can undeploy citadel pod and remove certs (secrets) by following steps:

$ kubectl scale --replicas=0 deployment -n istio-system istio-citadel
$ kubectl delete secrets -n istio-system istio.cluster-local-gateway-service-account

Then, try to restart cluster-local-gateway pod. You can see the same issue.

$ kubectl delete pod -n istio-system  cluster-local-gateway-xxx
ahmetb commented 4 years ago

@nak3 that makes sense. :) But how should we fix docs?