grafana / agent

Vendor-neutral programmable observability pipelines.
https://grafana.com/docs/agent/
Apache License 2.0
1.59k stars 486 forks source link

operator: resources violate PodSecurity policy #3373

Open uhthomas opened 1 year ago

uhthomas commented 1 year ago

I've been trying to install Grafana Agent Operator and get it set up properly for a while and have been struggling.

https://grafana.com/docs/grafana-cloud/kubernetes-monitoring/configuration/config-k8s-agent-guide/#configure-grafana-agent-for-metrics

I have applied the exact manifests suggested by the Grafana Agent Operator manifest generator and it does not work. It turns out the DaemonSets violate the cluster PodSecurity policy of "baseline" which isn't that strict.

❯ k -n grafana-agent get ds
NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
grafana-agent-integrations-ds   5         0         0       0            0           <none>          16m
grafana-agent-logs              5         0         0       0            0           <none>          2d3h

Looking deeper:

❯ k -n grafana-agent describe ds grafana-agent-logs
...
Events:
  Type     Reason        Age   From                  Message
  ----     ------        ----  ----                  -------
  Warning  FailedCreate  46m   daemonset-controller  Error creating: pods "grafana-agent-logs-tzr4t" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "varlog", "dockerlogs", "data"), privileged (container "grafana-agent" must not set securityContext.privileged=true)
  Warning  FailedCreate  29m   daemonset-controller  Error creating: pods "grafana-agent-logs-xxlrg" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "varlog", "dockerlogs", "data"), privileged (container "grafana-agent" must not set securityContext.privileged=true)
  Warning  FailedCreate  12m   daemonset-controller  Error creating: pods "grafana-agent-logs-bmp64" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "varlog", "dockerlogs", "data"), privileged (container "grafana-agent" must not set securityContext.privileged=true)
❯ k -n grafana-agent describe ds grafana-agent-integrations-ds
...
Events:
  Type     Reason        Age                From                  Message
  ----     ------        ----               ----                  -------
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-h8qnr" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-bb2np" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-q9nnz" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-mj95d" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-pc726" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-f6zlb" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-5p6b7" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-nrvqh" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  10m                daemonset-controller  Error creating: pods "grafana-agent-integrations-ds-cdk7w" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")
  Warning  FailedCreate  1s (x24 over 10m)  daemonset-controller  (combined from similar events): Error creating: pods "grafana-agent-integrations-ds-nlccd" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "grafana-agent-grafana-agent-rootfs", "grafana-agent-grafana-agent-sysfs", "grafana-agent-grafana-agent-procfs", "varlog", "dockerlogs")

Following from https://github.com/grafana/agent/issues/3363, some feedback would have gone a long way. There were no logs from the operator or the agent, no events on the CRDs (LogsInstance, Integrations). Even just something simple like created daemonset <namespace>/<name> would have given me enough information to know it was actually trying to do something.

For now, the workaround will be to grant the namespace elevated privileges.

#NamespaceList: items: [{metadata: labels: "pod-security.kubernetes.io/enforce": "privileged"}]

Grafana Agent Operator Manifest Generator

image

The generated manifests:

```yaml apiVersion: v1 kind: ServiceAccount metadata: name: grafana-agent namespace: ${NAMESPACE} --- apiVersion: v1 kind: ServiceAccount metadata: name: grafana-agent-operator namespace: ${NAMESPACE} --- apiVersion: v1 automountServiceAccountToken: false kind: ServiceAccount metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.5.0 name: kube-state-metrics namespace: ${NAMESPACE} --- apiVersion: v1 data: {} kind: Secret metadata: name: logs-secret namespace: ${NAMESPACE} stringData: password: "no" username: "no" type: Opaque --- apiVersion: v1 data: {} kind: Secret metadata: name: metrics-secret namespace: ${NAMESPACE} stringData: password: eyJrIjoiZTUwZTI3YmViNDg2Zjk1MTUwZDM4ZGMyNWE2MGQ4ODI4ZjkzOGY1MSIsIm4iOiJ1aHRob21hcy1lYXN5c3RhcnQtcHJvbS1wdWJsaXNoZXIiLCJpZCI6NDY5NDIyfQ== username: "53013" type: Opaque --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: agent-eventhandler namespace: ${NAMESPACE} spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: grafana-agent rules: - apiGroups: - "" resources: - nodes - nodes/proxy - nodes/metrics - services - endpoints - pods - events verbs: - get - list - watch - apiGroups: - networking.k8s.io resources: - ingresses verbs: - get - list - watch - nonResourceURLs: - /metrics - /metrics/cadvisor verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: grafana-agent-operator rules: - apiGroups: - monitoring.grafana.com resources: - grafanaagents - metricsinstances - logsinstances - podlogs - integrations verbs: - get - list - watch - apiGroups: - monitoring.grafana.com resources: - grafanaagents/finalizers - metricsinstances/finalizers - logsinstances/finalizers - podlogs/finalizers - integrations/finalizers verbs: - get - list - watch - update - apiGroups: - monitoring.coreos.com resources: - podmonitors - probes - servicemonitors verbs: - get - list - watch - apiGroups: - monitoring.coreos.com resources: - podmonitors/finalizers - probes/finalizers - servicemonitors/finalizers verbs: - get - list - watch - update - apiGroups: - "" resources: - namespaces - nodes verbs: - get - list - watch - apiGroups: - "" resources: - secrets - services - configmaps - endpoints verbs: - get - list - watch - create - update - patch - delete - apiGroups: - apps resources: - statefulsets - daemonsets - deployments verbs: - get - list - watch - create - update - patch - delete --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.5.0 name: kube-state-metrics rules: - apiGroups: - "" resources: - configmaps - secrets - nodes - pods - services - resourcequotas - replicationcontrollers - limitranges - persistentvolumeclaims - persistentvolumes - namespaces - endpoints verbs: - list - watch - apiGroups: - apps resources: - statefulsets - daemonsets - deployments - replicasets verbs: - list - watch - apiGroups: - batch resources: - cronjobs - jobs verbs: - list - watch - apiGroups: - autoscaling resources: - horizontalpodautoscalers verbs: - list - watch - apiGroups: - authentication.k8s.io resources: - tokenreviews verbs: - create - apiGroups: - authorization.k8s.io resources: - subjectaccessreviews verbs: - create - apiGroups: - policy resources: - poddisruptionbudgets verbs: - list - watch - apiGroups: - certificates.k8s.io resources: - certificatesigningrequests verbs: - list - watch - apiGroups: - storage.k8s.io resources: - storageclasses - volumeattachments verbs: - list - watch - apiGroups: - admissionregistration.k8s.io resources: - mutatingwebhookconfigurations - validatingwebhookconfigurations verbs: - list - watch - apiGroups: - networking.k8s.io resources: - networkpolicies - ingresses verbs: - list - watch - apiGroups: - coordination.k8s.io resources: - leases verbs: - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: grafana-agent roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: grafana-agent subjects: - kind: ServiceAccount name: grafana-agent namespace: ${NAMESPACE} --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: grafana-agent-operator roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: grafana-agent-operator subjects: - kind: ServiceAccount name: grafana-agent-operator namespace: ${NAMESPACE} --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.5.0 name: kube-state-metrics roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kube-state-metrics subjects: - kind: ServiceAccount name: kube-state-metrics namespace: ${NAMESPACE} --- apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.5.0 name: kube-state-metrics namespace: ${NAMESPACE} spec: clusterIP: None ports: - name: http-metrics port: 8080 targetPort: http-metrics - name: telemetry port: 8081 targetPort: telemetry selector: app.kubernetes.io/name: kube-state-metrics --- apiVersion: apps/v1 kind: Deployment metadata: name: grafana-agent-operator namespace: ${NAMESPACE} spec: minReadySeconds: 10 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: name: grafana-agent-operator template: metadata: labels: name: grafana-agent-operator spec: containers: - args: - --kubelet-service=default/kubelet image: grafana/agent-operator:v0.26.1 imagePullPolicy: IfNotPresent name: grafana-agent-operator serviceAccount: grafana-agent-operator --- apiVersion: apps/v1 kind: Deployment metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.5.0 name: kube-state-metrics namespace: ${NAMESPACE} spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: kube-state-metrics template: metadata: labels: app.kubernetes.io/component: exporter app.kubernetes.io/name: kube-state-metrics app.kubernetes.io/version: 2.5.0 spec: automountServiceAccountToken: true containers: - image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.5.0 livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 5 timeoutSeconds: 5 name: kube-state-metrics ports: - containerPort: 8080 name: http-metrics - containerPort: 8081 name: telemetry readinessProbe: httpGet: path: / port: 8081 initialDelaySeconds: 5 timeoutSeconds: 5 securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true runAsUser: 65534 nodeSelector: kubernetes.io/os: linux serviceAccountName: kube-state-metrics --- apiVersion: monitoring.grafana.com/v1alpha1 kind: GrafanaAgent metadata: name: grafana-agent namespace: ${NAMESPACE} spec: image: grafana/agent:v0.26.1 integrations: selector: matchLabels: agent: grafana-agent logs: instanceSelector: matchLabels: agent: grafana-agent metrics: externalLabels: cluster: ${CLUSTER} instanceSelector: matchLabels: agent: grafana-agent scrapeInterval: 15s serviceAccountName: grafana-agent --- apiVersion: monitoring.grafana.com/v1alpha1 kind: Integration metadata: labels: agent: grafana-agent name: agent-eventhandler namespace: ${NAMESPACE} spec: config: cache_path: /etc/eventhandler/eventhandler.cache logs_instance: ${NAMESPACE}/grafana-agent-logs name: eventhandler type: unique: true volumeMounts: - mountPath: /etc/eventhandler name: agent-eventhandler volumes: - name: agent-eventhandler persistentVolumeClaim: claimName: agent-eventhandler --- apiVersion: monitoring.grafana.com/v1alpha1 kind: Integration metadata: labels: agent: grafana-agent name: node-exporter namespace: ${NAMESPACE} spec: config: autoscrape: enable: true metrics_instance: ${NAMESPACE}/grafana-agent-metrics procfs_path: host/proc rootfs_path: /host/root sysfs_path: /host/sys name: node_exporter type: allNodes: true unique: true volumeMounts: - mountPath: /host/root name: rootfs - mountPath: /host/sys name: sysfs - mountPath: /host/proc name: procfs volumes: - hostPath: path: / name: rootfs - hostPath: path: /sys name: sysfs - hostPath: path: /proc name: procfs --- apiVersion: monitoring.grafana.com/v1alpha1 kind: LogsInstance metadata: labels: agent: grafana-agent name: grafana-agent-logs namespace: ${NAMESPACE} spec: clients: - basicAuth: password: key: password name: logs-secret username: key: username name: logs-secret externalLabels: cluster: ${CLUSTER} url: https://logs-prod-us-central1.grafana.net/loki/api/v1/push podLogsNamespaceSelector: {} podLogsSelector: matchLabels: instance: primary --- apiVersion: monitoring.grafana.com/v1alpha1 kind: MetricsInstance metadata: labels: agent: grafana-agent name: grafana-agent-metrics namespace: ${NAMESPACE} spec: podMonitorNamespaceSelector: {} podMonitorSelector: matchLabels: instance: primary remoteWrite: - basicAuth: password: key: password name: metrics-secret username: key: username name: metrics-secret url: https://prometheus-us-central1.grafana.net/api/prom/push serviceMonitorNamespaceSelector: {} serviceMonitorSelector: matchLabels: instance: primary --- apiVersion: monitoring.grafana.com/v1alpha1 kind: PodLogs metadata: labels: instance: primary name: kubernetes-logs namespace: ${NAMESPACE} spec: namespaceSelector: any: true pipelineStages: - cri: {} relabelings: - sourceLabels: - __meta_kubernetes_pod_node_name targetLabel: __host__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - action: replace sourceLabels: - __meta_kubernetes_namespace targetLabel: namespace - action: replace sourceLabels: - __meta_kubernetes_pod_name targetLabel: pod - action: replace sourceLabels: - __meta_kubernetes_container_name targetLabel: container - replacement: /var/log/pods/*$1/*.log separator: / sourceLabels: - __meta_kubernetes_pod_uid - __meta_kubernetes_pod_container_name targetLabel: __path__ selector: matchLabels: {} --- apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: instance: primary name: cadvisor-monitor namespace: ${NAMESPACE} spec: endpoints: - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token honorLabels: true interval: 60s metricRelabelings: - action: keep regex: kubelet_running_container_count|container_cpu_cfs_periods_total|kube_statefulset_status_observed_generation|kubelet_certificate_manager_client_expiration_renew_errors|container_network_transmit_packets_total|kubelet_running_pods|namespace_memory:kube_pod_container_resource_limits:sum|process_resident_memory_bytes|kube_pod_container_resource_requests|machine_memory_bytes|storage_operation_errors_total|kubelet_cgroup_manager_duration_seconds_count|volume_manager_total_volumes|kube_pod_status_reason|namespace_cpu:kube_pod_container_resource_requests:sum|node_namespace_pod_container:container_memory_cache|kubelet_pod_worker_duration_seconds_bucket|kube_statefulset_replicas|kube_namespace_status_phase|kube_deployment_spec_replicas|kube_pod_container_resource_limits|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|container_fs_reads_bytes_total|node_namespace_pod_container:container_memory_working_set_bytes|kubelet_pod_start_duration_seconds_count|kube_node_status_allocatable|kube_deployment_metadata_generation|kube_deployment_status_replicas_available|container_memory_rss|process_cpu_seconds_total|kube_job_failed|node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile|container_cpu_usage_seconds_total|kubelet_volume_stats_inodes_used|cluster:namespace:pod_memory:active:kube_pod_container_resource_requests|kubelet_running_pod_count|kubelet_running_containers|kubelet_runtime_operations_total|kube_pod_status_phase|node_filesystem_avail_bytes|kubelet_pod_start_duration_seconds_bucket|kube_horizontalpodautoscaler_spec_min_replicas|cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits|node_filesystem_size_bytes|container_fs_writes_total|container_fs_writes_bytes_total|rest_client_requests_total|kube_horizontalpodautoscaler_status_current_replicas|namespace_workload_pod:kube_pod_owner:relabel|namespace_memory:kube_pod_container_resource_requests:sum|go_goroutines|container_fs_reads_total|kube_node_status_capacity|node_namespace_pod_container:container_memory_rss|kube_deployment_status_replicas_updated|kube_statefulset_metadata_generation|kube_statefulset_status_current_revision|kube_horizontalpodautoscaler_status_desired_replicas|kube_node_status_condition|kubelet_volume_stats_capacity_bytes|kubelet_cgroup_manager_duration_seconds_bucket|storage_operation_duration_seconds_count|kube_statefulset_status_replicas_ready|kube_deployment_status_observed_generation|kube_daemonset_status_desired_number_scheduled|container_network_receive_packets_dropped_total|kube_pod_owner|kubelet_server_expiration_renew_errors|kubelet_volume_stats_inodes|namespace_cpu:kube_pod_container_resource_limits:sum|container_memory_cache|kubelet_runtime_operations_errors_total|kube_statefulset_status_replicas|container_network_transmit_packets_dropped_total|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_resourcequota|kube_job_status_start_time|container_network_transmit_bytes_total|kube_node_info|kubelet_node_config_error|kube_job_status_active|kube_daemonset_status_number_available|kubelet_pleg_relist_interval_seconds_bucket|kubelet_pod_worker_duration_seconds_count|kube_daemonset_status_number_misscheduled|kube_daemonset_status_current_number_scheduled|kubelet_pleg_relist_duration_seconds_bucket|kube_statefulset_status_replicas_updated|kubelet_certificate_manager_client_ttl_seconds|container_memory_working_set_bytes|node_namespace_pod_container:container_memory_swap|kube_node_spec_taint|cluster:namespace:pod_memory:active:kube_pod_container_resource_limits|container_memory_swap|kube_pod_info|container_network_receive_packets_total|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|kube_replicaset_owner|kube_daemonset_status_updated_number_scheduled|container_cpu_cfs_throttled_periods_total|kube_horizontalpodautoscaler_spec_max_replicas|namespace_workload_pod|container_network_receive_bytes_total|kube_statefulset_status_update_revision|kubernetes_build_info|kubelet_certificate_manager_server_ttl_seconds|kubelet_volume_stats_available_bytes|kubelet_node_name|kubelet_pleg_relist_duration_seconds_count|kube_pod_container_status_waiting_reason|kube_namespace_status_phase|container_cpu_usage_seconds_total|kube_pod_status_phase|kube_pod_start_time|kube_pod_container_status_restarts_total|kube_pod_container_info|kube_pod_container_status_waiting_reason|kube_daemonset.*|kube_replicaset.*|kube_statefulset.*|kube_job.*|kube_node.*|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_requests:sum|node_cpu.*|node_memory.*|node_filesystem.* sourceLabels: - __name__ path: /metrics/cadvisor port: https-metrics relabelings: - sourceLabels: - __metrics_path__ targetLabel: metrics_path - action: replace replacement: integrations/kubernetes/cadvisor targetLabel: job scheme: https tlsConfig: insecureSkipVerify: true namespaceSelector: any: true selector: matchLabels: app.kubernetes.io/name: kubelet --- apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: instance: primary name: ksm-monitor namespace: ${NAMESPACE} spec: endpoints: - honorLabels: true interval: 60s metricRelabelings: - action: keep regex: kubelet_running_container_count|container_cpu_cfs_periods_total|kube_statefulset_status_observed_generation|kubelet_certificate_manager_client_expiration_renew_errors|container_network_transmit_packets_total|kubelet_running_pods|namespace_memory:kube_pod_container_resource_limits:sum|process_resident_memory_bytes|kube_pod_container_resource_requests|machine_memory_bytes|storage_operation_errors_total|kubelet_cgroup_manager_duration_seconds_count|volume_manager_total_volumes|kube_pod_status_reason|namespace_cpu:kube_pod_container_resource_requests:sum|node_namespace_pod_container:container_memory_cache|kubelet_pod_worker_duration_seconds_bucket|kube_statefulset_replicas|kube_namespace_status_phase|kube_deployment_spec_replicas|kube_pod_container_resource_limits|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|container_fs_reads_bytes_total|node_namespace_pod_container:container_memory_working_set_bytes|kubelet_pod_start_duration_seconds_count|kube_node_status_allocatable|kube_deployment_metadata_generation|kube_deployment_status_replicas_available|container_memory_rss|process_cpu_seconds_total|kube_job_failed|node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile|container_cpu_usage_seconds_total|kubelet_volume_stats_inodes_used|cluster:namespace:pod_memory:active:kube_pod_container_resource_requests|kubelet_running_pod_count|kubelet_running_containers|kubelet_runtime_operations_total|kube_pod_status_phase|node_filesystem_avail_bytes|kubelet_pod_start_duration_seconds_bucket|kube_horizontalpodautoscaler_spec_min_replicas|cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits|node_filesystem_size_bytes|container_fs_writes_total|container_fs_writes_bytes_total|rest_client_requests_total|kube_horizontalpodautoscaler_status_current_replicas|namespace_workload_pod:kube_pod_owner:relabel|namespace_memory:kube_pod_container_resource_requests:sum|go_goroutines|container_fs_reads_total|kube_node_status_capacity|node_namespace_pod_container:container_memory_rss|kube_deployment_status_replicas_updated|kube_statefulset_metadata_generation|kube_statefulset_status_current_revision|kube_horizontalpodautoscaler_status_desired_replicas|kube_node_status_condition|kubelet_volume_stats_capacity_bytes|kubelet_cgroup_manager_duration_seconds_bucket|storage_operation_duration_seconds_count|kube_statefulset_status_replicas_ready|kube_deployment_status_observed_generation|kube_daemonset_status_desired_number_scheduled|container_network_receive_packets_dropped_total|kube_pod_owner|kubelet_server_expiration_renew_errors|kubelet_volume_stats_inodes|namespace_cpu:kube_pod_container_resource_limits:sum|container_memory_cache|kubelet_runtime_operations_errors_total|kube_statefulset_status_replicas|container_network_transmit_packets_dropped_total|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_resourcequota|kube_job_status_start_time|container_network_transmit_bytes_total|kube_node_info|kubelet_node_config_error|kube_job_status_active|kube_daemonset_status_number_available|kubelet_pleg_relist_interval_seconds_bucket|kubelet_pod_worker_duration_seconds_count|kube_daemonset_status_number_misscheduled|kube_daemonset_status_current_number_scheduled|kubelet_pleg_relist_duration_seconds_bucket|kube_statefulset_status_replicas_updated|kubelet_certificate_manager_client_ttl_seconds|container_memory_working_set_bytes|node_namespace_pod_container:container_memory_swap|kube_node_spec_taint|cluster:namespace:pod_memory:active:kube_pod_container_resource_limits|container_memory_swap|kube_pod_info|container_network_receive_packets_total|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|kube_replicaset_owner|kube_daemonset_status_updated_number_scheduled|container_cpu_cfs_throttled_periods_total|kube_horizontalpodautoscaler_spec_max_replicas|namespace_workload_pod|container_network_receive_bytes_total|kube_statefulset_status_update_revision|kubernetes_build_info|kubelet_certificate_manager_server_ttl_seconds|kubelet_volume_stats_available_bytes|kubelet_node_name|kubelet_pleg_relist_duration_seconds_count|kube_pod_container_status_waiting_reason|kube_namespace_status_phase|container_cpu_usage_seconds_total|kube_pod_status_phase|kube_pod_start_time|kube_pod_container_status_restarts_total|kube_pod_container_info|kube_pod_container_status_waiting_reason|kube_daemonset.*|kube_replicaset.*|kube_statefulset.*|kube_job.*|kube_node.*|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_requests:sum|node_cpu.*|node_memory.*|node_filesystem.* sourceLabels: - __name__ path: /metrics port: http-metrics relabelings: - action: replace replacement: integrations/kubernetes/kube-state-metrics targetLabel: job namespaceSelector: matchNames: - ${NAMESPACE} selector: matchLabels: app.kubernetes.io/name: kube-state-metrics --- apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: instance: primary name: kubelet-monitor namespace: ${NAMESPACE} spec: endpoints: - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token honorLabels: true interval: 60s metricRelabelings: - action: keep regex: kubelet_running_container_count|container_cpu_cfs_periods_total|kube_statefulset_status_observed_generation|kubelet_certificate_manager_client_expiration_renew_errors|container_network_transmit_packets_total|kubelet_running_pods|namespace_memory:kube_pod_container_resource_limits:sum|process_resident_memory_bytes|kube_pod_container_resource_requests|machine_memory_bytes|storage_operation_errors_total|kubelet_cgroup_manager_duration_seconds_count|volume_manager_total_volumes|kube_pod_status_reason|namespace_cpu:kube_pod_container_resource_requests:sum|node_namespace_pod_container:container_memory_cache|kubelet_pod_worker_duration_seconds_bucket|kube_statefulset_replicas|kube_namespace_status_phase|kube_deployment_spec_replicas|kube_pod_container_resource_limits|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|container_fs_reads_bytes_total|node_namespace_pod_container:container_memory_working_set_bytes|kubelet_pod_start_duration_seconds_count|kube_node_status_allocatable|kube_deployment_metadata_generation|kube_deployment_status_replicas_available|container_memory_rss|process_cpu_seconds_total|kube_job_failed|node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile|container_cpu_usage_seconds_total|kubelet_volume_stats_inodes_used|cluster:namespace:pod_memory:active:kube_pod_container_resource_requests|kubelet_running_pod_count|kubelet_running_containers|kubelet_runtime_operations_total|kube_pod_status_phase|node_filesystem_avail_bytes|kubelet_pod_start_duration_seconds_bucket|kube_horizontalpodautoscaler_spec_min_replicas|cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits|node_filesystem_size_bytes|container_fs_writes_total|container_fs_writes_bytes_total|rest_client_requests_total|kube_horizontalpodautoscaler_status_current_replicas|namespace_workload_pod:kube_pod_owner:relabel|namespace_memory:kube_pod_container_resource_requests:sum|go_goroutines|container_fs_reads_total|kube_node_status_capacity|node_namespace_pod_container:container_memory_rss|kube_deployment_status_replicas_updated|kube_statefulset_metadata_generation|kube_statefulset_status_current_revision|kube_horizontalpodautoscaler_status_desired_replicas|kube_node_status_condition|kubelet_volume_stats_capacity_bytes|kubelet_cgroup_manager_duration_seconds_bucket|storage_operation_duration_seconds_count|kube_statefulset_status_replicas_ready|kube_deployment_status_observed_generation|kube_daemonset_status_desired_number_scheduled|container_network_receive_packets_dropped_total|kube_pod_owner|kubelet_server_expiration_renew_errors|kubelet_volume_stats_inodes|namespace_cpu:kube_pod_container_resource_limits:sum|container_memory_cache|kubelet_runtime_operations_errors_total|kube_statefulset_status_replicas|container_network_transmit_packets_dropped_total|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_resourcequota|kube_job_status_start_time|container_network_transmit_bytes_total|kube_node_info|kubelet_node_config_error|kube_job_status_active|kube_daemonset_status_number_available|kubelet_pleg_relist_interval_seconds_bucket|kubelet_pod_worker_duration_seconds_count|kube_daemonset_status_number_misscheduled|kube_daemonset_status_current_number_scheduled|kubelet_pleg_relist_duration_seconds_bucket|kube_statefulset_status_replicas_updated|kubelet_certificate_manager_client_ttl_seconds|container_memory_working_set_bytes|node_namespace_pod_container:container_memory_swap|kube_node_spec_taint|cluster:namespace:pod_memory:active:kube_pod_container_resource_limits|container_memory_swap|kube_pod_info|container_network_receive_packets_total|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|kube_replicaset_owner|kube_daemonset_status_updated_number_scheduled|container_cpu_cfs_throttled_periods_total|kube_horizontalpodautoscaler_spec_max_replicas|namespace_workload_pod|container_network_receive_bytes_total|kube_statefulset_status_update_revision|kubernetes_build_info|kubelet_certificate_manager_server_ttl_seconds|kubelet_volume_stats_available_bytes|kubelet_node_name|kubelet_pleg_relist_duration_seconds_count|kube_pod_container_status_waiting_reason|kube_namespace_status_phase|container_cpu_usage_seconds_total|kube_pod_status_phase|kube_pod_start_time|kube_pod_container_status_restarts_total|kube_pod_container_info|kube_pod_container_status_waiting_reason|kube_daemonset.*|kube_replicaset.*|kube_statefulset.*|kube_job.*|kube_node.*|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_requests:sum|node_cpu.*|node_memory.*|node_filesystem.* sourceLabels: - __name__ path: /metrics port: https-metrics relabelings: - sourceLabels: - __metrics_path__ targetLabel: metrics_path - action: replace replacement: integrations/kubernetes/kubelet targetLabel: job scheme: https tlsConfig: insecureSkipVerify: true namespaceSelector: any: true selector: matchLabels: app.kubernetes.io/name: kubelet ```
dakr0013 commented 11 months ago

Any updates?

micheljung commented 9 months ago

I ran into the same issue when deploying the Loki Helm Chart:

Warning  FailedCreate  60m   daemonset-controller  Error creating: pods "grafana-loki-logs-ccn4k" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volumes "varlog", "dockerlogs", "data"), privileged (container "grafana-agent" must not set securityContext.privileged=true)

Is this linked to #2781?