open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.13k stars 395 forks source link

Automatically create rbac permissions flag for Prometheus receiver #3078

Open paebersold-tyro opened 2 weeks ago

paebersold-tyro commented 2 weeks ago

Component(s)

collector

What happened?

Description

I am running the opentelemetry-operator with the --create-rbac-permissions flag set. When a new OpenTelemetryCollector resource is created (eg mode: daemonset) new pods are created and a new serviceaccount is created as well. However no new clusterroles or clusterrolebindings are created. This results in prometheus scrape errors due to lack of permissions for example. Eg

E0627 04:07:32.435836       1 reflector.go:147] k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:observability:collector-with-ta-collector" cannot list resource "pods" in API group "" in the namespace "app-platform-monitoring"

No logs are generated on the operator-manager pod.

The clusterole that the operator manager is using has the access to create clusterroles/clusterrolebindings (I am deploying via the helm chart opentelemetry-operator version 0.62.0 (https://open-telemetry.github.io/opentelemetry-helm-charts)

Based on other issues raised previously it seems this flag was optional but now may no longer be required with the permissions being automatically granted based on existing access - I would like clarification on this aspect too please.

Steps to Reproduce

Run the opentelementry-operator with the create-rbac-permissions flag.

Expected Result

Clusterroles/bindings would be create when the new collector pods are created

Actual Result

No new roles/bindings created

Kubernetes Version

1.29

Operator version

0.102.0

Collector version

0.102.0

Environment information

Serviceaccount used by manager

% kubectl -n observability get pods otel-operator-opentelemetry-operator-dfb985c65-ngh9n -o yaml | grep serviceAccount   
  serviceAccount: opentelemetry-operator

Clusterrolebinding

% kubectl get clusterrolebinding -o wide | grep opentelemetry-operator
otel-operator-opentelemetry-operator-manager               ClusterRole/otel-operator-opentelemetry-operator-manager             6d 

clusterrole for the operator manager (generated via helm chart)

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/instance: otel-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.102.0
    backstage.io/kubernetes-id: eyre-otel-operator
    helm.sh/chart: opentelemetry-operator-0.62.0
    tyro.cloud/source: eyre-otel-operator
    tyro.cloud/system: observability-platform
    tyroTaggingVersion: 3.0.0
    tyroTeam: observability
  name: otel-operator-opentelemetry-operator-manager
rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
      - persistentvolumeclaims
      - persistentvolumes
      - pods
      - serviceaccounts
      - services
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - create
      - patch
  - apiGroups:
      - ""
    resources:
      - namespaces
    verbs:
      - list
      - watch
  - apiGroups:
      - apps
    resources:
      - daemonsets
      - deployments
      - statefulsets
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - apps
      - extensions
    resources:
      - replicasets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - autoscaling
    resources:
      - horizontalpodautoscalers
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - rbac.authorization.k8s.io
    resources:
      - clusterroles
      - clusterrolebindings
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes
      - namespaces
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - batch
    resources:
      - jobs
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - config.openshift.io
    resources:
      - infrastructures
      - infrastructures/status
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    verbs:
      - create
      - get
      - list
      - update
  - apiGroups:
      - monitoring.coreos.com
    resources:
      - podmonitors
      - servicemonitors
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - networking.k8s.io
    resources:
      - ingresses
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - opentelemetry.io
    resources:
      - instrumentations
    verbs:
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - opentelemetry.io
    resources:
      - opampbridges
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - opentelemetry.io
    resources:
      - opampbridges/finalizers
    verbs:
      - update
  - apiGroups:
      - opentelemetry.io
    resources:
      - opampbridges/status
    verbs:
      - get
      - patch
      - update
  - apiGroups:
      - opentelemetry.io
    resources:
      - opentelemetrycollectors
    verbs:
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - opentelemetry.io
    resources:
      - opentelemetrycollectors/finalizers
    verbs:
      - get
      - patch
      - update
  - apiGroups:
      - opentelemetry.io
    resources:
      - opentelemetrycollectors/status
    verbs:
      - get
      - patch
      - update
  - apiGroups:
      - policy
    resources:
      - poddisruptionbudgets
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - route.openshift.io
    resources:
      - routes
      - routes/custom-host
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch

Log output

No response

Additional context

Pods created via manager..

collector-with-ta-collector-dn9q6                      1/1     Running     0          18m
collector-with-ta-collector-f8fm2                      1/1     Running     0          18m
collector-with-ta-collector-gh5dx                      1/1     Running     0          18m

Associated service account

NAME                              SECRETS   AGE
collector-with-ta-collector       0         18m

No clusterroles/etc associated

% kubectl get clusterrolebinding -o wide | grep collector-with-ta-collector 
% 
% date                        
Thu 27 Jun 2024 14:28:12 AEST
% kubectl get clusterrole | grep 2024-06-27
%
jaronoff97 commented 1 week ago

@iblancasa anything jumping out as problematic here?

pavolloffay commented 1 week ago

IIRC the --create-rbac-permissions does not create RBAC for TA/promethes.

pavolloffay commented 1 week ago

however it would be great to support it

paebersold-tyro commented 1 week ago

fyi for clarity my test setup did not use the target allocator (I'm aware the current helm charts require you to manually setup the target allocator RBAC resources). Apologies for the confusion in the naming. My sample config is below.

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: collector-with-ta
spec:
  mode: daemonset
  targetAllocator:
    enabled: false
  config:
    processors:
      batch: {}
    receivers:
      prometheus:
        config:
          scrape_configs:
          - job_name: test-pushgateway
            scrape_interval: 30s
            scrape_timeout: 10s
            honor_labels: true
            scheme: http
            kubernetes_sd_configs:
            - role: pod
              namespaces:
                names:
                - app-platform-monitoring
            relabel_configs:
            # and pod is running
            - source_labels: [__meta_kubernetes_pod_phase]
              regex: Running
              action: keep
            # and pod is ready
            - source_labels: [__meta_kubernetes_pod_ready]
              regex: true
              action: keep
            # and only metrics endpoints
            - source_labels: [__meta_kubernetes_pod_container_port_name]
              action: keep
              regex: metrics
    exporters:
      debug: []
    service:
      telemetry:
        logs:
          level: debug
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: []
          exporters: [debug]
fyuan1316 commented 1 week ago

@paebersold-tyro I am not sure whether to describe this as a bug or a feature request. However, I can definitely reproduce the issue. The root cause seems to be that the creation and management of RBAC for the component is done on a case-by-case basis, which will require a process to gradually provide support.

pavolloffay commented 1 week ago

Correct, this should be an enhancement proposal to automate RBAC for the prometheus receiver.

pavolloffay commented 1 week ago

I have updated the title, please edit it if it does not match what is being asked here.

paebersold-tyro commented 1 week ago

Thanks for the clarification on the issue and fine with the title update. Ideally it would be great to have a note on exactly what create-rbac-permissions gives you out of the box too.

iblancasa commented 5 days ago

Actually, the title should be changed because the flag does nothing now. https://github.com/open-telemetry/opentelemetry-operator/blob/main/main.go#L149

Now, we check if the operator has permissions to create RBAC resources and, if permissions are there, the operator will create the RBAC resources.