jaegertracing / jaeger-operator

Jaeger Operator for Kubernetes simplifies deploying and running Jaeger on Kubernetes.
https://www.jaegertracing.io/docs/latest/operator/
Apache License 2.0
1.01k stars 342 forks source link

Failed to list namespaces error when not using cluster-wide mode #1431

Open ediezh opened 3 years ago

ediezh commented 3 years ago

Deployed the operator 1.22 following these instructions

kubectl create namespace observability
kubectl create -n observability -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml
kubectl create -n observability -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/service_account.yaml
kubectl create -n observability -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role.yaml
kubectl create -n observability -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role_binding.yaml
kubectl create -n observability -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/operator.yaml

The operator watches only the observability namespace. But I'm seeing these errors in the log:

E0416 08:20:50.253032       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125: Failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:observability:jaeger-operator" cannot list resource "namespaces" in API group "" at the cluster scope
jpkrohling commented 3 years ago

@ediezh could you please provide the deployment that is being used by the operator?

kubectl get deployment -n observability jaeger-operator -o yaml

I'm especially interested in the value of the WATCH_NAMESPACE.

ediezh commented 3 years ago

@jpkrohling here is the deployment.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubectl.kubernetes.io/restartedAt: "2021-04-16T08:10:16Z"
    kubernetes.io/psp: eks.privileged
    prometheus.io/path: /stats/prometheus
    prometheus.io/port: "15020"
    prometheus.io/scrape: "true"
    sidecar.istio.io/status: '{"version":"8e6e902b765af607513b28d284940ee1421e9a0d07698741693b2663c7161c11","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null}'
  creationTimestamp: "2021-04-16T08:10:16Z"
  generateName: jaeger-operator-799c7fbbb5-
  labels:
    app.kubernetes.io/name: jaeger-operator
    istio.io/rev: 1-7-6
    pod-template-hash: 799c7fbbb5
    security.istio.io/tlsMode: istio
    service.istio.io/canonical-name: jaeger-operator
    service.istio.io/canonical-revision: latest
  name: jaeger-operator-799c7fbbb5-c7l6g
  namespace: observability
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: jaeger-operator-799c7fbbb5
    uid: 8b7d7de3-7bd7-4349-b4df-fb4c0d9a4c04
  resourceVersion: "22781385"
  selfLink: /api/v1/namespaces/observability/pods/jaeger-operator-799c7fbbb5-c7l6g
  uid: db8dc1f6-a2c8-46be-ac42-d19fcd400b2d
spec:
  containers:
  - args:
    - proxy
    - sidecar
    - --domain
    - $(POD_NAMESPACE).svc.cluster.local
    - --serviceCluster
    - jaeger-operator.observability
    - --proxyLogLevel=info
    - --proxyComponentLogLevel=misc:error
    - --trust-domain=cluster.local
    - --concurrency
    - "2"
    env:
    - name: JWT_POLICY
      value: third-party-jwt
    - name: PILOT_CERT_PROVIDER
      value: istiod
    - name: CA_ADDR
      value: istiod-1-7-6.istio-system.svc:15012
    - name: POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: INSTANCE_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    - name: SERVICE_ACCOUNT
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.serviceAccountName
    - name: HOST_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
    - name: CANONICAL_SERVICE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.labels['service.istio.io/canonical-name']
    - name: CANONICAL_REVISION
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.labels['service.istio.io/canonical-revision']
    - name: PROXY_CONFIG
      value: |
        {"discoveryAddress":"istiod-1-7-6.istio-system.svc:15012","tracing":{"zipkin":{"address":"jaeger-operator-jaeger-collector.observability.svc.cluster.local:9411"}},"proxyMetadata":{"DNS_AGENT":""}}
    - name: ISTIO_META_POD_PORTS
      value: |-
        [
            {"name":"metrics","containerPort":8383,"protocol":"TCP"}
        ]
    - name: ISTIO_META_APP_CONTAINERS
      value: jaeger-operator
    - name: ISTIO_META_CLUSTER_ID
      value: Kubernetes
    - name: ISTIO_META_INTERCEPTION_MODE
      value: REDIRECT
    - name: ISTIO_METAJSON_ANNOTATIONS
      value: |
        {"kubectl.kubernetes.io/restartedAt":"2021-04-16T08:10:16Z","kubernetes.io/psp":"eks.privileged"}
    - name: ISTIO_META_WORKLOAD_NAME
      value: jaeger-operator
    - name: ISTIO_META_OWNER
      value: kubernetes://apis/apps/v1/namespaces/observability/deployments/jaeger-operator
    - name: ISTIO_META_MESH_ID
      value: cluster.local
    - name: DNS_AGENT
    - name: ISTIO_KUBE_APP_PROBERS
      value: '{}'
    image: 018661664346.dkr.ecr.ap-northeast-2.amazonaws.com/istio/proxyv2:1.7.6
    imagePullPolicy: IfNotPresent
    lifecycle:
      postStart:
        exec:
          command:
          - pilot-agent
          - wait
    name: istio-proxy
    ports:
    - containerPort: 15090
      name: http-envoy-prom
      protocol: TCP
    readinessProbe:
      failureThreshold: 30
      httpGet:
        path: /healthz/ready
        port: 15021
        scheme: HTTP
      initialDelaySeconds: 1
      periodSeconds: 2
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: "2"
        memory: 1Gi
      requests:
        cpu: 100m
        memory: 128Mi
    securityContext:
      allowPrivilegeEscalation: true
      capabilities:
        drop:
        - ALL
      privileged: true
      readOnlyRootFilesystem: true
      runAsGroup: 1337
      runAsNonRoot: true
      runAsUser: 1337
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/istio
      name: istiod-ca-cert
    - mountPath: /var/lib/istio/data
      name: istio-data
    - mountPath: /etc/istio/proxy
      name: istio-envoy
    - mountPath: /var/run/secrets/tokens
      name: istio-token
    - mountPath: /etc/istio/pod
      name: istio-podinfo
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: jaeger-operator-token-4n56t
      readOnly: true
  - args:
    - start
    env:
    - name: WATCH_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: OPERATOR_NAME
      value: jaeger-operator
    image: 018661664346.dkr.ecr.ap-northeast-2.amazonaws.com/jaegertracing/jaeger-operator:1.22.0
    imagePullPolicy: IfNotPresent
    name: jaeger-operator
    ports:
    - containerPort: 8383
      name: metrics
      protocol: TCP
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: jaeger-operator-token-4n56t
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - args:
    - istio-iptables
    - -p
    - "15001"
    - -z
    - "15006"
    - -u
    - "1337"
    - -m
    - REDIRECT
    - -i
    - '*'
    - -x
    - ""
    - -b
    - '*'
    - -d
    - 15090,15021,15020
    env:
    - name: DNS_AGENT
    image: 018661664346.dkr.ecr.ap-northeast-2.amazonaws.com/istio/proxyv2:1.7.6
    imagePullPolicy: IfNotPresent
    name: istio-init
    resources:
      limits:
        cpu: "2"
        memory: 1Gi
      requests:
        cpu: 10m
        memory: 10Mi
    securityContext:
      allowPrivilegeEscalation: true
      capabilities:
        add:
        - NET_ADMIN
        - NET_RAW
        drop:
        - ALL
      privileged: true
      readOnlyRootFilesystem: false
      runAsGroup: 0
      runAsNonRoot: false
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: jaeger-operator-token-4n56t
      readOnly: true
  nodeName: ip-10-202-118-5.ap-northeast-2.compute.internal
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1337
  serviceAccount: jaeger-operator
  serviceAccountName: jaeger-operator
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: jaeger-operator-token-4n56t
    secret:
      defaultMode: 420
      secretName: jaeger-operator-token-4n56t
  - emptyDir:
      medium: Memory
    name: istio-envoy
  - emptyDir: {}
    name: istio-data
  - downwardAPI:
      defaultMode: 420
      items:
      - fieldRef:
          apiVersion: v1
          fieldPath: metadata.labels
        path: labels
      - fieldRef:
          apiVersion: v1
          fieldPath: metadata.annotations
        path: annotations
    name: istio-podinfo
  - name: istio-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: istio-ca
          expirationSeconds: 43200
          path: istio-token
  - configMap:
      defaultMode: 420
      name: istio-ca-root-cert
    name: istiod-ca-cert
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-04-16T08:10:18Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-04-16T08:10:21Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-04-16T08:10:21Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-04-16T08:10:16Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://4aa62e1e71bd4ba4404a684e806eaa36ab91ed94fa10315fd6b3cb1ffc6d5052
    image: 018661664346.dkr.ecr.ap-northeast-2.amazonaws.com/istio/proxyv2:1.7.6
    imageID: docker-pullable://018661664346.dkr.ecr.ap-northeast-2.amazonaws.com/istio/proxyv2@sha256:e526549f64ebcc3436de9f92ae322856dd02bd59f58c3bc034d31f9fce45d6a0
    lastState: {}
    name: istio-proxy
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-04-16T08:10:18Z"
  - containerID: docker://c18a1846759cd41f03b672db0dcb414f92e2b0d4dbd13d19f23ad0a4de20ca15
    image: 018661664346.dkr.ecr.ap-northeast-2.amazonaws.com/jaegertracing/jaeger-operator:1.22.0
    imageID: docker-pullable://018661664346.dkr.ecr.ap-northeast-2.amazonaws.com/jaegertracing/jaeger-operator@sha256:da7f9035f3a35f86d5f936e2720608d84ceae127ddbfcb8f78694c80fa678927
    lastState: {}
    name: jaeger-operator
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-04-16T08:10:20Z"
  hostIP: 10.202.118.5
  initContainerStatuses:
  - containerID: docker://c67d71f910e720baa9f5654a336de0d7fbdecd66f47271a8015b134218a3ca4f
    image: 018661664346.dkr.ecr.ap-northeast-2.amazonaws.com/istio/proxyv2:1.7.6
    imageID: docker-pullable://018661664346.dkr.ecr.ap-northeast-2.amazonaws.com/istio/proxyv2@sha256:e526549f64ebcc3436de9f92ae322856dd02bd59f58c3bc034d31f9fce45d6a0
    lastState: {}
    name: istio-init
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: docker://c67d71f910e720baa9f5654a336de0d7fbdecd66f47271a8015b134218a3ca4f
        exitCode: 0
        finishedAt: "2021-04-16T08:10:17Z"
        reason: Completed
        startedAt: "2021-04-16T08:10:17Z"
  phase: Running
  podIP: 10.202.100.133
  podIPs:
  - ip: 10.202.100.133
  qosClass: Burstable
  startTime: "2021-04-16T08:10:16Z"
jpkrohling commented 3 years ago

It does look like a bug to me: the operator shouldn't be looking at all namespaces when it's restricted to one.

@rkukura, are you able to take a look at this one?

jpkrohling commented 3 years ago

ping @rkukura, are you able to work on this soon?

chadlwilson commented 3 years ago

FWIW, in addition to the error about listing namespaces, I also got secrets is forbidden: User \"system:serviceaccount:default:my-release-jaeger-operator\" cannot list resource \"secrets\" in API group \"\" at the cluster scope with 1.22.1 when deployed via the official Helm chart.

Morriz commented 3 years ago

bumping this as the current operator with create enabled does not install the CR. Setting rbac.clusterRole: true does not help as there is no mention of namespace in the RBAC delivered with the chart.

Morriz commented 3 years ago

Adding this little snippet to the role will fix it:

- apiGroups:
  - ''
  resources:
  - namespaces
  verbs:
  - list
  - watch
SCLogo commented 3 years ago

@Morriz will you open PR about it?

Morriz commented 3 years ago

sure

anarcher commented 3 years ago

I'm not sure but get verbs is needed for namespaces

- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - list
  - watch
  - get
Morriz commented 3 years ago

don't think so. I saw no more errors without it...

jpkrohling commented 3 years ago

We have this already listed in the cluster role:

https://github.com/jaegertracing/jaeger-operator/blob/27a0cd1b821636b33ba6570ab23d4906b3730d06/deploy/cluster_role.yaml#L191-L199

Is that not sufficient?

chadlwilson commented 3 years ago

I have a feeling some of the conversation here might be confusing things a bit, since there was a period of time where both modes had some problems on various versions of the Helm chart and operator. The original ticket was about when not running in cluster mode, i.e when WATCH_NAMESPACE is defined and points to a single namespace.

It seemed the operator itself still tried to list namespaces which yielded this error, and which in theory it should not need to do or attempt to do when it was running in single namespace mode, since to do so would require the ClusterRole which you wouldn't normally be deploying when in this mode.

Indeed the official Helm chart will not create ClusterRoles and ClusterRoleBindings when in this mode: https://github.com/jaegertracing/helm-charts/blob/59f51fd7caf924faaea951cf064f86c3f35e8b78/charts/jaeger-operator/templates/role.yaml#L2

... using the same value in the chart which controls "single namespace mode" on the Deployment. https://github.com/jaegertracing/helm-charts/blob/59f51fd7caf924faaea951cf064f86c3f35e8b78/charts/jaeger-operator/templates/deployment.yaml#L43-L50

Nevertheless, that PR doesn't seem like it'd fix this issue to me since it is touching the ClusterRole.

esnible commented 3 years ago

I looked into this to see if I could fix it but it is above my skill level with sigs.k8s.io/controller-runtime.

_jaeger-operator/pkg/controller/namespace/namespacecontroller.go:52 sets up a Watch() for namespaces so it can wake up when sidecar.jaegertracing.io/inject changes.

It would be straightforward to skip watching namespaces if $WATCH_NAMESPACE is exactly one namespace but then we lose the ability to respond to changes in the annotation, even in the watched namespace.

I couldn't figure out how use Controller.Watch() for anything less than all namespaces. I didn't chase this problem for long because I suspect there is no solution -- I suspect controller-runtime Watch() requires RBAC to list. (I would love to be wrong!). The possibility to watch a single namespace using the go-client exists (see https://github.com/kubernetes/kubernetes/issues/43299 ) but using a totally different technique to watch for annotations seems too complex for this problem.

While working on this I noticed that if the Jaeger Operator has permission to list the namespaces it will reconcile ALL the namespaces, not just $WATCH_NAMESPACE. Set the operator to log at trace level, run locally with WATCH_NAMESPACE=observability make run-debug and you'll see it claim to reconcile all of the namespaces. Can someone confirm if this is unintended behavior and file an issue to track? WATCH_NAMESPACE should constrain both looking for Jaeger CRs and looking for Jaeger annotations, no?

esnible commented 3 years ago

I suspect there is an additional problem that needs to be fixed. (This may need its own issue or perhaps it can be part of this issue.)

If Jaeger Operator service account has no clusterrolebinding to LIST namespaces, I suspect it will fail to utilize sidecar.jaegertracing.io/inject namespace annotations, even for the namespace it is running within.

esnible commented 2 years ago

@ediezh Work around for the message. The warning should be suppressed if the operator is started with --enable-namespace-controller=false.

s9r-5 commented 1 year ago

I am facing the same issue .

User "system:serviceaccount:jaeger-operator:jaeger-operator" cannot list resource "namespaces" in API group "" at the cluster scope

frzifus commented 1 year ago

hi @s9r-5, what jaeger operator version do you use?

malcolm061990 commented 1 year ago

Guys, the same issue. Deployed jaeger-operator 2.39.0 from helm chart using default values (non-cluster wide) in observability ns. When I try to create the simplest jaeger cluster I get:

1.6744841582063437e+09  ERROR   Reconciler error    {"controller": "jaeger", "controllerGroup": "jaegertracing.io", "controllerKind": "Jaeger", "Jaeger": {"name":"simplest","namespace":"observability"}, "namespace": "observability", "name": "simplest", "reconcileID": "44701060-9d41-42ad-b849-dfe6f916d60b", "error": "deployments.apps is forbidden: User \"system:serviceaccount:observability:jaeger-crd-jaeger-operator\" cannot list resource \"deployments\" in API group \"apps\" at the cluster scope"}

For some reason operator wants to list deployments in cluster scope but I don't use cluster scope. I can't deploy even simplest jaeger cluster.

ryanobjc commented 9 months ago

I am seeing these errors in my logs, but also maybe the deployments are getting provisioned?

I am using jaeger-operator installed from the helm-chart, version 2.49.0. I am using all the defaults in values.yaml and installed into the namespace 'observability' as expected.

sicil1ano commented 7 months ago

Is this repo still active? This issue has been active since April 2021 and it seems it is not getting enough traction. I think it is important for the community to have this issue solved.

Jaeger should not be using a cluster-wide role to access namespaces if we explicitly don't want to use this role, and this means also that Jaeger is not following the principle of least privilege. Why would Jaeger require more permissions than it needs?

iblancasa commented 7 months ago

Is this repo still active? This issue has been active since April 2021 and it seems it is not getting enough traction. I think it is important for the community to have this issue solved.

It is. We will be pleased to review a PR with a solution.