istio / old_issues_repo

Deprecated issue-tracking repo, please post new issues or feature requests to istio/istio instead.
37 stars 9 forks source link

Mixer Report failed with: CANCELLED #353

Closed markns closed 6 years ago

markns commented 6 years ago

Is this a BUG or FEATURE REQUEST?: Bug

Did you review https://istio.io/help/ and existing issues to identify if this is already solved or being worked on?: Y

What Version of Istio and Kubernetes are you using, where did you get Istio from, Installation details

istio version:
istio-release-0.8-20180523-09-15

kubectl version:

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9+", GitVersion:"v1.9.6-gke.1", GitCommit:"cb151369f60073317da686a6ce7de36abe2bda8d", GitTreeState:"clean", BuildDate:"2018-04-07T22:06:59Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

Is Istio Auth enabled or not ? Did you install the stable istio.yaml, istio-auth.yaml.... or if using the Helm chart please provide full command line input.

helm install install/kubernetes/helm/istio --name istio --namespace istio-system \
        --set global.mtls.enabled=true \
        --set global.proxy.includeIPRanges="10.28.0.0/14\,10.31.240.0/20" \
        --set global.proxy.image=proxyv2 \
        --set ingressgateway.enabled=true \
        --set ingress.enabled=false \
        --set egressgateway.enabled=false \
        --set prometheus.enabled=false

additionally, the following config was altered:     
mtlsExcludedServices: ["kubernetes.default.svc.cluster.local","tiller-deploy.kube-system.svc.cluster.local"]

What happened:

gRPC streams are dropped after 15 seconds, and the message [libprotobuf ERROR src/istio/mixerclient/report_batch.cc:83] Mixer Report failed with: CANCELLED appears in the istio-proxy logs for the service pod.

Regular HTTP requests also cause the Mixer report failed message, although the request succeeds

What you expected to happen:

gRPC streams to be maintained until they are cancelled by the gRPC client or server, or the virtual service timeout is elapsed.

How to reproduce it:

Create a new cluster with the helm command above and run a pod with gateway and virtual service config as follows:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: gateway
  namespace: networking
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - '*.mydomain.site'
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      mode: SIMPLE
      privateKey: /etc/istio/ingressgateway-certs/tls.key
      serverCertificate: /etc/istio/ingressgateway-certs/tls.crt

---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: mystio-welcome-app
  namespace: networking
spec:
  gateways:
  - gateway
  hosts:
  - mystio-welcome-app.mydomain.site
  http:
  - route:
    - destination:
        host: welcome-app.mystio.svc.cluster.local
    timeout: 10000s

Feature Request: N

mandarjog commented 6 years ago

Can you attach the output of tools/dump_kubernetes.sh -z It is in the release archive.

qiwzhang commented 6 years ago

Yes, we need logs to debug this.

markns commented 6 years ago

Thanks for looking into it 👍 Here's the log files: istio-dump.tar.gz

btw, dump_kubernetes.sh is not in the release/tools, but is in source/bin.

qiwzhang commented 6 years ago

We found the problem. It is a mTLS mis-match between data-plane proxy and control-plane proxy on port 15004.
Assigned to @mandarjog

mandarjog commented 6 years ago

Hey @markns Can you do kubectl --namespace istio-system edit cm istio and change controlPlaneAuthPolicy: MUTUAL_TLS Thanks

markns commented 6 years ago

Hey guys, I ran helm upgrade --set global.controlPlaneSecurityEnabled=true ... and it works! Thanks for the help.

(connections are still being dropped after 15 seconds, but as I half-expected, this must be a separate issue)

kidiyoor commented 6 years ago

Seeing this issue in proxy (0.8.0) running on mesh expansion VM. mTLS is not enabled for control plane or data plane.

[libprotobuf ERROR src/istio/mixerclient/report_batch.cc:83] Mixer Report failed with: CANCELLED

dump.tar.gz

qiwzhang commented 6 years ago

Which containers?

kidiyoor commented 6 years ago

Error is not seen in container/POD logs ie. error is not seen in services running in k8s.

It is seen in the 'sidecar proxy' that is run on the VM (mesh ex VM)

root      6465  0.0  0.0 191880  2440 ?        S    Jun14   0:00 su -s /bin/bash -c INSTANCE_IP=10.128.15.195 POD_NAME=thru-2 POD_NAMESPACE=default exec /opt/apigee/apigee-pilot-agent/bin/pilot-agent proxy  --configPath /opt/apigee/apigee-proxy/conf/proxy --binaryPath /opt/apigee/apigee-proxy/bin/envoy --serviceCluster rawvm --discoveryAddress
 istio-pilot.istio-system:8080 --controlPlaneAuthPolicy NONE root

root      6507  0.0  0.3  34712 13652 ?        Sl   Jun14   0:00 /opt/apigee/apigee-pilot-agent/bin/pilot-agent proxy --configPath /opt/apigee/apigee-proxy/conf/proxy --binaryPath /opt/apigee/apigee-proxy/bin/envoy --serviceCluster rawvm --discoveryAddress istio-pilot.istio-system:8080 --controlPlaneAuthPolicy NONE

root      6512  6.4  1.0 113492 36772 ?        Sl   Jun14   4:48 /opt/apigee/apigee-proxy/bin/envoy -c /opt/apigee/apigee-proxy/conf/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 2 --parent-shutdown-time-s 3 --service-cluster rawvm --service-node sidecar~10.128.15.195~thru-2.default~default.svc.cluster.local --max-obj-name-len 189 -l 
warn --v2-config-only
qiwzhang commented 6 years ago

can you get envoy /config_dump from the vm envoy?

inside the proxy container, use curl http://localhost:15000/config_dump

kidiyoor commented 6 years ago

The proxy doesn't have /config_dump path

[gauthamvk@thru-2 ~]$ curl http://127.0.01:15000/config_dump
invalid path. admin commands are:
  /: Admin home page
  /certs: print certs on machine
  /clusters: upstream cluster status
  /cpuprofiler: enable/disable the CPU profiler
  /healthcheck/fail: cause the server to fail health checks
  /healthcheck/ok: cause the server to pass health checks
  /help: print out list of admin commands
  /hot_restart_version: print the hot restart compatability version
  /listeners: print listener addresses
  /logging: query/change logging levels
  /quitquitquit: exit the server
  /reset_counters: reset all counters to zero
  /routes: print out currently loaded dynamic HTTP route tables
  /runtime: print runtime values
  /server_info: print server version/status information
[gauthamvk@thru-2 ~]$ /opt/apigee/apigee-proxy/bin/envoy --version
/opt/apigee/apigee-proxy/bin/envoy  version: 0/1.6.0//DEBUG
qiwzhang commented 6 years ago

Hmm, you may be running very old Envoy. The latest envoy should have /config_dump

kidiyoor commented 6 years ago

updated the proxy to newer version. Not seeing this error anymore