operator-framework / operator-lifecycle-manager

A management framework for extending Kubernetes with Operators
https://olm.operatorframework.io
Apache License 2.0
1.7k stars 542 forks source link

CatalogSource pods crashing when under Istio #2343

Open sathieu opened 3 years ago

sathieu commented 3 years ago

Bug Report

What did you do?

I had to disable Istio injection to ensure it works:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: operatorhubio-catalog
  namespace: operator-lifecycle-manager
  annotations:
    sidecar.istio.io/inject: 'false' # <===== HERE
spec:
  sourceType: grpc
  image: quay.io/operatorhubio/catalog:latest
  displayName: Community Operators
  publisher: OperatorHub.io
  updateStrategy:
    registryPoll:
      interval: 60m

Otherwise, pods are always crashing:

NAME                               READY   STATUS        RESTARTS   AGE
catalog-operator-699bcff75-chl9c   2/2     Running       0          30h
olm-operator-6fb9698bbc-k4gv7      2/2     Running       0          30h
operatorhubio-catalog-8lv2h        0/2     Terminating   0          4s
operatorhubio-catalog-mt2j7        0/2     Pending       0          <invalid>
operatorhubio-catalog-ps8pt        0/2     Terminating   0          0s
packageserver-76bfd988c7-dvphf     2/2     Running       0          30h
packageserver-76bfd988c7-n9mrj     2/2     Running       0          30h

What did you expect to see?

No crash, even with Istio injection.

What did you see instead? Under which circumstances?

Pods crashing.

Here is a crashing Pod spec: ```yaml apiVersion: v1 kind: Pod metadata: annotations: cni.projectcalico.org/podIP: 10.233.81.103/32 cni.projectcalico.org/podIPs: 10.233.81.103/32 k8s.v1.cni.cncf.io/networks: istio-cni kubectl.kubernetes.io/default-container: registry-server kubectl.kubernetes.io/default-logs-container: registry-server kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"operators.coreos.com/v1alpha1","kind":"CatalogSource","metadata":{"annotations":{},"labels":{"argocd.argoproj.io/instance":"operator-lifecycle-manager"},"name":"operatorhubio-catalog","namespace":"operator-lifecycle-manager"},"spec":{"displayName":"Community Operators","image":"quay.io/operatorhubio/catalog:latest","publisher":"OperatorHub.io","sourceType":"grpc","updateStrategy":{"registryPoll":{"interval":"60m"}}}} kubernetes.io/psp: privileged sidecar.istio.io/interceptionMode: REDIRECT sidecar.istio.io/status: '{"initContainers":["istio-validation"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null,"revision":"default"}' traffic.sidecar.istio.io/excludeInboundPorts: "15020" traffic.sidecar.istio.io/includeInboundPorts: '*' traffic.sidecar.istio.io/includeOutboundIPRanges: '*' creationTimestamp: "2021-09-08T14:12:35Z" deletionGracePeriodSeconds: 1 deletionTimestamp: "2021-09-08T14:12:37Z" generateName: operatorhubio-catalog- labels: olm.catalogSource: operatorhubio-catalog olm.pod-spec-hash: 655cfc9c56 security.istio.io/tlsMode: istio service.istio.io/canonical-name: "" service.istio.io/canonical-revision: latest managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:kubectl.kubernetes.io/last-applied-configuration: {} f:generateName: {} f:labels: .: {} f:olm.catalogSource: {} f:olm.pod-spec-hash: {} f:ownerReferences: .: {} k:{"uid":"89f01719-008a-4197-af27-ac8b9f2d580e"}: .: {} f:apiVersion: {} f:blockOwnerDeletion: {} f:controller: {} f:kind: {} f:name: {} f:uid: {} f:spec: f:containers: k:{"name":"registry-server"}: .: {} f:image: {} f:imagePullPolicy: {} f:livenessProbe: .: {} f:exec: .: {} f:command: {} f:failureThreshold: {} f:initialDelaySeconds: {} f:periodSeconds: {} f:successThreshold: {} f:timeoutSeconds: {} f:name: {} f:ports: .: {} k:{"containerPort":50051,"protocol":"TCP"}: .: {} f:containerPort: {} f:name: {} f:protocol: {} f:readinessProbe: .: {} f:exec: .: {} f:command: {} f:failureThreshold: {} f:initialDelaySeconds: {} f:periodSeconds: {} f:successThreshold: {} f:timeoutSeconds: {} f:resources: .: {} f:requests: .: {} f:cpu: {} f:memory: {} f:securityContext: .: {} f:readOnlyRootFilesystem: {} f:terminationMessagePath: {} f:terminationMessagePolicy: {} f:dnsPolicy: {} f:enableServiceLinks: {} f:nodeSelector: .: {} f:kubernetes.io/os: {} f:restartPolicy: {} f:schedulerName: {} f:securityContext: .: {} f:fsGroup: {} f:serviceAccount: {} f:serviceAccountName: {} f:terminationGracePeriodSeconds: {} manager: catalog operation: Update time: "2021-09-08T14:12:35Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: f:cni.projectcalico.org/podIP: {} f:cni.projectcalico.org/podIPs: {} manager: calico operation: Update time: "2021-09-08T14:12:36Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:status: f:conditions: k:{"type":"ContainersReady"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} k:{"type":"Initialized"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} k:{"type":"Ready"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} f:containerStatuses: {} f:hostIP: {} f:initContainerStatuses: {} f:startTime: {} manager: kubelet operation: Update time: "2021-09-08T14:12:36Z" name: operatorhubio-catalog-jhtb2 namespace: operator-lifecycle-manager ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: CatalogSource name: operatorhubio-catalog uid: 89f01719-008a-4197-af27-ac8b9f2d580e resourceVersion: "4035852" uid: 18a1db35-c497-4778-810c-0ef5bf2a31d9 spec: containers: - args: - proxy - sidecar - --domain - $(POD_NAMESPACE).svc.cluster.local - --proxyLogLevel=warning - --proxyComponentLogLevel=misc:error - --log_output_level=default:info - --concurrency - "2" env: - name: JWT_POLICY value: third-party-jwt - name: PILOT_CERT_PROVIDER value: istiod - name: CA_ADDR value: istiod.istio-system.svc:15012 - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: INSTANCE_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP - name: SERVICE_ACCOUNT valueFrom: fieldRef: apiVersion: v1 fieldPath: spec.serviceAccountName - name: HOST_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.hostIP - name: PROXY_CONFIG value: | {"holdApplicationUntilProxyStarts":true} - name: ISTIO_META_POD_PORTS value: |- [ {"name":"grpc","containerPort":50051,"protocol":"TCP"} ] - name: ISTIO_META_APP_CONTAINERS value: registry-server - name: ISTIO_META_CLUSTER_ID value: Kubernetes - name: ISTIO_META_INTERCEPTION_MODE value: REDIRECT - name: ISTIO_META_MESH_ID value: cluster.local - name: TRUST_DOMAIN value: cluster.local image: docker.io/istio/proxyv2:1.11.2 imagePullPolicy: IfNotPresent lifecycle: postStart: exec: command: - pilot-agent - wait name: istio-proxy ports: - containerPort: 15090 name: http-envoy-prom protocol: TCP readinessProbe: failureThreshold: 30 httpGet: path: /healthz/ready port: 15021 scheme: HTTP initialDelaySeconds: 1 periodSeconds: 2 successThreshold: 1 timeoutSeconds: 3 resources: limits: cpu: "2" memory: 1Gi requests: cpu: 100m memory: 128Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsGroup: 1337 runAsNonRoot: true runAsUser: 1337 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/secrets/istio name: istiod-ca-cert - mountPath: /var/lib/istio/data name: istio-data - mountPath: /etc/istio/proxy name: istio-envoy - mountPath: /var/run/secrets/tokens name: istio-token - mountPath: /etc/istio/pod name: istio-podinfo - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: operatorhubio-catalog-token-pvq4s readOnly: true - image: quay.io/operatorhubio/catalog:latest imagePullPolicy: Always livenessProbe: exec: command: - grpc_health_probe - -addr=:50051 failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: registry-server ports: - containerPort: 50051 name: grpc protocol: TCP readinessProbe: exec: command: - grpc_health_probe - -addr=:50051 failureThreshold: 3 initialDelaySeconds: 5 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 resources: requests: cpu: 10m memory: 50Mi securityContext: readOnlyRootFilesystem: false terminationMessagePath: /dev/termination-log terminationMessagePolicy: FallbackToLogsOnError volumeMounts: - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: operatorhubio-catalog-token-pvq4s readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true initContainers: - args: - istio-iptables - -p - "15001" - -z - "15006" - -u - "1337" - -m - REDIRECT - -i - '*' - -x - "" - -b - '*' - -d - 15090,15021,15020 - --run-validation - --skip-rule-apply image: docker.io/istio/proxyv2:1.11.2 imagePullPolicy: IfNotPresent name: istio-validation resources: limits: cpu: "2" memory: 1Gi requests: cpu: 100m memory: 128Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsGroup: 1337 runAsNonRoot: true runAsUser: 1337 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: operatorhubio-catalog-token-pvq4s readOnly: true nodeName: REDACTED nodeSelector: kubernetes.io/os: linux preemptionPolicy: PreemptLowerPriority priority: 0 restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1337 serviceAccount: operatorhubio-catalog serviceAccountName: operatorhubio-catalog terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 volumes: - emptyDir: medium: Memory name: istio-envoy - emptyDir: {} name: istio-data - downwardAPI: defaultMode: 420 items: - fieldRef: apiVersion: v1 fieldPath: metadata.labels path: labels - fieldRef: apiVersion: v1 fieldPath: metadata.annotations path: annotations name: istio-podinfo - name: istio-token projected: defaultMode: 420 sources: - serviceAccountToken: audience: istio-ca expirationSeconds: 43200 path: istio-token - configMap: defaultMode: 420 name: istio-ca-root-cert name: istiod-ca-cert - name: operatorhubio-catalog-token-pvq4s secret: defaultMode: 420 secretName: operatorhubio-catalog-token-pvq4s status: conditions: - lastProbeTime: null lastTransitionTime: "2021-09-08T14:12:27Z" message: 'containers with incomplete status: [istio-validation]' reason: ContainersNotInitialized status: "False" type: Initialized - lastProbeTime: null lastTransitionTime: "2021-09-08T14:12:27Z" message: 'containers with unready status: [istio-proxy registry-server]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: "2021-09-08T14:12:27Z" message: 'containers with unready status: [istio-proxy registry-server]' reason: ContainersNotReady status: "False" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2021-09-08T14:12:30Z" status: "True" type: PodScheduled containerStatuses: - image: docker.io/istio/proxyv2:1.11.2 imageID: "" lastState: {} name: istio-proxy ready: false restartCount: 0 started: false state: waiting: reason: PodInitializing - image: quay.io/operatorhubio/catalog:latest imageID: "" lastState: {} name: registry-server ready: false restartCount: 0 started: false state: waiting: reason: PodInitializing hostIP: REDACTED initContainerStatuses: - image: docker.io/istio/proxyv2:1.11.2 imageID: "" lastState: {} name: istio-validation ready: false restartCount: 0 state: waiting: reason: PodInitializing phase: Pending qosClass: Burstable startTime: "2021-09-08T14:12:27Z" ```

Environment

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.7", GitCommit:"132a687512d7fb058d0f5890f07d4121b3f0a2e2", GitTreeState:"clean", BuildDate:"2021-05-12T12:40:09Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.7", GitCommit:"132a687512d7fb058d0f5890f07d4121b3f0a2e2", GitTreeState:"clean", BuildDate:"2021-05-12T12:32:49Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}

Possible Solution

Additional context

I tried to debug this without luck. The pod is probably created from pkg/controller/registry/reconciler/grpc.go, but why is it crashing?

dinhxuanvu commented 3 years ago

We never support sidecar on our registry/catalogsource pod so I think this istio sidecar injection won't work off the bat unfortunately. This is likely to be a feature request.

akihikokuroda commented 2 years ago

I tried istio (1.11.3) +OLM on IKS (IBM Cloud Kubernetes Service). CatalogSource pod came up successfully.

catalog-operator-5d785f7b7b-6zsr2   2/2     Running   1          32m
olm-operator-7dccd9645-9gzqh        2/2     Running   1          32m
operatorhubio-catalog-wt794         2/2     Running   0          32m
packageserver-85b7b96bff-6z2d7      2/2     Running   0          32m
packageserver-85b7b96bff-xkszq      2/2     Running   0          32m
root@7cbf863de855:~/go/src/operator-lifecycle-manager# kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T18:03:20Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.10+IKS", GitCommit:"07c0cf54875fcf1f3620ffee4e21f1031463d6a7", GitTreeState:"clean", BuildDate:"2021-08-12T21:36:36Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

I'll try kind next.

akihikokuroda commented 2 years ago

It works in kind, too. Only the difference is istio version. I'm using 1.11.3. @sathieu Would you try with 1.11.3?

Akihikos-MBP-2:operator-lifecycle-manager akihikokuroda$ kubectl get pod -n olm
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-6d578c5764-4pnkq   2/2     Running   2          5m42s
olm-operator-5b58594fc8-5z9bw       2/2     Running   2          5m42s
operatorhubio-catalog-fdbck         2/2     Running   0          5m10s
packageserver-86954d6db8-p27t4      2/2     Running   0          5m8s
packageserver-86954d6db8-pmrzv      2/2     Running   0          5m7s

Here is the operatorhubio-catalog-wt794 pod status

  containerStatuses:
  - containerID: containerd://8765bfc2a76f748f9457ca265af1322c8e2db195ab9d12f879ae9dd29d9d3ad8
    image: docker.io/istio/proxyv2:1.11.3
    imageID: docker.io/istio/proxyv2@sha256:28513eb3706315b26610a53e0d66b29b09a334e3164393b9a0591f34fe47a6fd
    lastState: {}
    name: istio-proxy
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-09-28T14:50:28Z"
  - containerID: containerd://79d0a208132e672b45b0d08c9c930264ad163c9bebadc0c73227bcbc24367021
    image: quay.io/operatorhubio/catalog:latest
    imageID: quay.io/operatorhubio/catalog@sha256:ceee6bec437089305947e6929b9792f1bf4b7f93397afe2472b0a3d856c27777
    lastState: {}
    name: registry-server
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-09-28T14:50:28Z"
  hostIP: 172.18.0.2
  initContainerStatuses:
  - containerID: containerd://db999863c5cd8afdbaab1a962a923298b8aee04fbd808cbcb0e4d7bcccac602b
    image: docker.io/istio/proxyv2:1.11.3
    imageID: docker.io/istio/proxyv2@sha256:28513eb3706315b26610a53e0d66b29b09a334e3164393b9a0591f34fe47a6fd
    lastState: {}
    name: istio-init
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: containerd://db999863c5cd8afdbaab1a962a923298b8aee04fbd808cbcb0e4d7bcccac602b
        exitCode: 0
        finishedAt: "2021-09-28T14:50:22Z"
        reason: Completed
        startedAt: "2021-09-28T14:50:22Z"