grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.33k stars 182 forks source link

Grafana Alloy container fails to start: exec /bin/alloy: operation not permitted #630

Closed PatMis16 closed 4 months ago

PatMis16 commented 5 months ago

What's wrong?

Grafana Alloy needs to be deployable to an EKS cluster with enforced podSecurityContext and container securityContext policies. Our policy mandates that the container operates as a non-root user (runAsNonRoot: true) with all capabilities dropped:

securityContext

securityContext:
      readOnlyRootFilesystem: true
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

podSecurityContext

podSecurityContext:
      fsGroup: 2000
      runAsNonRoot: true
      runAsUser: 473

Deploying Grafana Alloy to EKS with a podSecurityContext that specifies runAsUser results in the container failing to start. The error logged is:

exec /bin/alloy: operation not permitted

In our organization running containers as root is not permitted and enforced by Kyverno policies.

Steps to reproduce

To reproduce the deployment, install Grafana Alloy using the specified podSecurityContext while ensuring that no Custom Resource Definitions (CRDs), Custom Resources (CRs), ClusterRoleBindings (CRBs), or Role-Based Access Control (RBAC) configurations are deployed as they are also not allowed.

System information

Deployment on AWS EKS is enhanced with Kyverno Policies, ensuring adherence to the organization's standards.

Software version

v1.0.0

Configuration

# -- Overrides the chart's name. Used to change the infix in the resource names.
nameOverride: null

# -- Overrides the chart's computed fullname. Used to change the full prefix of
# resource names.
fullnameOverride: null

## Global properties for image pulling override the values defined under `image.registry` and `configReloader.image.registry`.
## If you want to override only one image registry, use the specific fields but if you want to override them all, use `global.image.registry`
global:
  image:
    # -- Global image registry to use if it needs to be overriden for some specific use cases (e.g local registries, custom images, ...)
    registry: ""

    # -- Optional set of global image pull secrets.
    pullSecrets: []

  # -- Security context to apply to the Grafana Alloy pod.
  podSecurityContext:
    podSecurityContext:
      fsGroup: 2000
      runAsNonRoot: true
      runAsUser: 473

crds:
  # -- Whether to install CRDs for monitoring.
  create: true

## Various Alloy settings. For backwards compatibility with the grafana-agent
## chart, this field may also be called "agent". Naming this field "agent" is
## deprecated and will be removed in a future release.
alloy:
  configMap:
    # -- Create a new ConfigMap for the config file.
    create: true
    # -- Content to assign to the new ConfigMap.  This is passed into `tpl` allowing for templating from values.
    content: ''

    # -- Name of existing ConfigMap to use. Used when create is false.
    name: null
    # -- Key in ConfigMap to get config from.
    key: null

  clustering:
    # -- Deploy Alloy in a cluster to allow for load distribution.
    enabled: false

  # -- Minimum stability level of components and behavior to enable. Must be
  # one of "experimental", "public-preview", or "generally-available".
  stabilityLevel: "generally-available"

  # -- Path to where Grafana Alloy stores data (for example, the Write-Ahead Log).
  # By default, data is lost between reboots.
  storagePath: /tmp/alloy

  # -- Address to listen for traffic on. 0.0.0.0 exposes the UI to other
  # containers.
  listenAddr: 0.0.0.0

  # -- Port to listen for traffic on.
  listenPort: 12345

  # -- Scheme is needed for readiness probes. If enabling tls in your configs, set to "HTTPS"
  listenScheme: HTTP

  # --  Base path where the UI is exposed.
  uiPathPrefix: /

  # -- Enables sending Grafana Labs anonymous usage stats to help improve Grafana
  # Alloy.
  enableReporting: true

  # -- Extra environment variables to pass to the Alloy container.
  extraEnv: []

  # -- Maps all the keys on a ConfigMap or Secret as environment variables. https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#envfromsource-v1-core
  envFrom: []

  # -- Extra args to pass to `alloy run`: https://grafana.com/docs/alloy/latest/reference/cli/run/
  extraArgs: []

  # -- Extra ports to expose on the Alloy container.
  extraPorts: []
  # - name: "faro"
  #   port: 12347
  #   targetPort: 12347
  #   protocol: "TCP"

  mounts:
    # -- Mount /var/log from the host into the container for log collection.
    varlog: false
    # -- Mount /var/lib/docker/containers from the host into the container for log
    # collection.
    dockercontainers: false

    # -- Extra volume mounts to add into the Grafana Alloy container. Does not
    # affect the watch container.
    extra: []

  # -- Security context to apply to the Grafana Alloy container.
  securityContext: {}

  # -- Resource requests and limits to apply to the Grafana Alloy container.
  resources: {}

image:
  # -- Grafana Alloy image registry (defaults to docker.io)
  registry: "docker.io"
  # -- Grafana Alloy image repository.
  repository: grafana/alloy
  # -- (string) Grafana Alloy image tag. When empty, the Chart's appVersion is
  # used.
  tag: null
  # -- Grafana Alloy image's SHA256 digest (either in format "sha256:XYZ" or "XYZ"). When set, will override `image.tag`.
  digest: null
  # -- Grafana Alloy image pull policy.
  pullPolicy: IfNotPresent
  # -- Optional set of image pull secrets.
  pullSecrets: []

rbac:
  # -- Whether to create RBAC resources for Alloy.
  create: false

serviceAccount:
  # -- Whether to create a service account for the Grafana Alloy deployment.
  create: true
  # -- Additional labels to add to the created service account.
  additionalLabels: {}
  # -- Annotations to add to the created service account.
  annotations: {}
  # -- The name of the existing service account to use when
  # serviceAccount.create is false.
  name: null

# Options for the extra controller used for config reloading.
configReloader:
  # -- Enables automatically reloading when the Alloy config changes.
  enabled: true
  image:
    # -- Config reloader image registry (defaults to docker.io)
    registry: "ghcr.io"
    # -- Repository to get config reloader image from.
    repository: jimmidyson/configmap-reload
    # -- Tag of image to use for config reloading.
    tag: v0.12.0
    # -- SHA256 digest of image to use for config reloading (either in format "sha256:XYZ" or "XYZ"). When set, will override `configReloader.image.tag`
    digest: ""
  # -- Override the args passed to the container.
  customArgs: []
  # -- Resource requests and limits to apply to the config reloader container.
  resources:
    requests:
      cpu: "1m"
      memory: "5Mi"
  # -- Security context to apply to the Grafana configReloader container.
  securityContext:
    readOnlyRootFilesystem: true
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL

controller:
  # -- Type of controller to use for deploying Grafana Alloy in the cluster.
  # Must be one of 'daemonset', 'deployment', or 'statefulset'.
  type: 'daemonset'

  # -- Number of pods to deploy. Ignored when controller.type is 'daemonset'.
  replicas: 1

  # -- Annotations to add to controller.
  extraAnnotations: {}

  # -- Whether to deploy pods in parallel. Only used when controller.type is
  # 'statefulset'.
  parallelRollout: true

  # -- Configures Pods to use the host network. When set to true, the ports that will be used must be specified.
  hostNetwork: false

  # -- Configures Pods to use the host PID namespace.
  hostPID: false

  # -- Configures the DNS policy for the pod. https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy
  dnsPolicy: ClusterFirst

  # -- Update strategy for updating deployed Pods.
  updateStrategy: {}

  # -- nodeSelector to apply to Grafana Alloy pods.
  nodeSelector: {}

  # -- Tolerations to apply to Grafana Alloy pods.
  tolerations: []

  # -- Topology Spread Constraints to apply to Grafana Alloy pods.
  topologySpreadConstraints: []

  # -- priorityClassName to apply to Grafana Alloy pods.
  priorityClassName: ''

  # -- Extra pod annotations to add.
  podAnnotations: {}

  # -- Extra pod labels to add.
  podLabels: {}

  # -- Whether to enable automatic deletion of stale PVCs due to a scale down operation, when controller.type is 'statefulset'.
  enableStatefulSetAutoDeletePVC: false

  autoscaling:
    # -- Creates a HorizontalPodAutoscaler for controller type deployment.
    enabled: false
    # -- The lower limit for the number of replicas to which the autoscaler can scale down.
    minReplicas: 1
    # -- The upper limit for the number of replicas to which the autoscaler can scale up.
    maxReplicas: 5
    # -- Average CPU utilization across all relevant pods, a percentage of the requested value of the resource for the pods. Setting `targetCPUUtilizationPercentage` to 0 will disable CPU scaling.
    targetCPUUtilizationPercentage: 0
    # -- Average Memory utilization across all relevant pods, a percentage of the requested value of the resource for the pods. Setting `targetMemoryUtilizationPercentage` to 0 will disable Memory scaling.
    targetMemoryUtilizationPercentage: 80

    scaleDown:
      # -- List of policies to determine the scale-down behavior.
      policies: []
        # - type: Pods
        #   value: 4
        #   periodSeconds: 60
      # -- Determines which of the provided scaling-down policies to apply if multiple are specified.
      selectPolicy: Max
      # -- The duration that the autoscaling mechanism should look back on to make decisions about scaling down.
      stabilizationWindowSeconds: 300

    scaleUp:
      # -- List of policies to determine the scale-up behavior.
      policies: []
        # - type: Pods
        #   value: 4
        #   periodSeconds: 60
      # -- Determines which of the provided scaling-up policies to apply if multiple are specified.
      selectPolicy: Max
      # -- The duration that the autoscaling mechanism should look back on to make decisions about scaling up.
      stabilizationWindowSeconds: 0

  # -- Affinity configuration for pods.
  affinity: {}

  volumes:
    # -- Extra volumes to add to the Grafana Alloy pod.
    extra: []

  # -- volumeClaimTemplates to add when controller.type is 'statefulset'.
  volumeClaimTemplates: []

  ## -- Additional init containers to run.
  ## ref: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
  ##
  initContainers: []

  # -- Additional containers to run alongside the Alloy container and initContainers.
  extraContainers: []

service:
  # -- Creates a Service for the controller's pods.
  enabled: true
  # -- Service type
  type: ClusterIP
  # -- NodePort port. Only takes effect when `service.type: NodePort`
  nodePort: 31128
  # -- Cluster IP, can be set to None, empty "" or an IP address
  clusterIP: ''
  # -- Value for internal traffic policy. 'Cluster' or 'Local'
  internalTrafficPolicy: Cluster
  annotations: {}
    # cloud.google.com/load-balancer-type: Internal

serviceMonitor:
  enabled: false
  # -- Additional labels for the service monitor.
  additionalLabels: {}
  # -- Scrape interval. If not set, the Prometheus default scrape interval is used.
  interval: ""
  # -- MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
  # ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
  metricRelabelings: []
  # - action: keep
  #   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
  #   sourceLabels: [__name__]

  # -- Customize tls parameters for the service monitor
  tlsConfig: {}

  # -- RelabelConfigs to apply to samples before scraping
  # ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
  relabelings: []
  # - sourceLabels: [__meta_kubernetes_pod_node_name]
  #   separator: ;
  #   regex: ^(.*)$
  #   targetLabel: nodename
  #   replacement: $1
  #   action: replace
ingress:
  # -- Enables ingress for Alloy (Faro port)
  enabled: false
  # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
  # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
  # ingressClassName: nginx
  # Values can be templated
  annotations:
    {}
    # kubernetes.io/ingress.class: nginx
    # kubernetes.io/tls-acme: "true"
  labels: {}
  path: /
  faroPort: 12347

  # pathType is only for k8s >= 1.1=
  pathType: Prefix

  hosts:
    - chart-example.local
  ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services.
  extraPaths: []
  # - path: /*
  #   backend:
  #     serviceName: ssl-redirect
  #     servicePort: use-annotation
  ## Or for k8s > 1.19
  # - path: /*
  #   pathType: Prefix
  #   backend:
  #     service:
  #       name: ssl-redirect
  #       port:
  #         name: use-annotation

  tls: []
  #  - secretName: chart-example-tls
  #    hosts:
  #      - chart-example.local

Logs

exec /bin/alloy: operation not permitted
tpaschalis commented 5 months ago

Hmm, interested to see if it's similar to #177 🤔

PatMis16 commented 5 months ago

Yes, however, our policiy requires all capabilities to be dropped.

PatMis16 commented 5 months ago

We were able to deploy the Grafan Agent in flow mode without any issues. The issue only occurs with the new Grafana Alloy.

PatMis16 commented 5 months ago

To say it blunt: The solution fo issue 177 won't work in our case.

PatMis16 commented 5 months ago

Is this the right place for this issue or should would it be better to address this in the Grafana Helm Chart repository https://github.com/grafana/helm-charts?

PatMis16 commented 5 months ago

Updated the description.

captncraig commented 5 months ago

Can you share your grafana agent flow config that is working? I can reproduce this in that dropping all capabilities causes the permission error. Others have shared lists of required capabilities that work in openshift. I am not really aware of a way around that, but I can try and figure out a diff between a working flow config and a non-working alloy one.

PatMis16 commented 5 months ago

Hi @captncraig Sure. This is the values file for the Grafana Agent deployment in "Flow" mode:

grafana-agent:
  # -- Overrides the chart's name. Used to change the infix in the resource names.
  nameOverride: null

  # -- Overrides the chart's computed fullname. Used to change the full prefix of
  # resource names.
  fullnameOverride: null

  ## Global properties for image pulling override the values defined under `image.registry` and `configReloader.image.registry`.
  ## If you want to override only one image registry, use the specific fields but if you want to override them all, use `global.image.registry`
  global:
    image:
      # -- Global image registry to use if it needs to be overriden for some specific use cases (e.g local registries, custom images, ...)
      registry: "docker.io"

      # -- Optional set of global image pull secrets.
      pullSecrets: []

    # -- Security context to apply to the Grafana Agent pod.
    podSecurityContext:
      fsGroup: 2000
      runAsNonRoot: true
      runAsUser: 1000

  crds:
    # -- Whether to install CRDs for monitoring.
    create: false

  # Various agent settings.
  agent:
    # -- Mode to run Grafana Agent in. Can be "flow" or "static".
    mode: 'flow'
    configMap:
      # -- Create a new ConfigMap for the config file.
      create: false
      # -- Content to assign to the new ConfigMap.  This is passed into `tpl` allowing for templating from values.
      content: ''

      # -- Name of existing ConfigMap to use. Used when create is false.
      name: grafana-agent-config
      #name: null
      # -- Key in ConfigMap to get config from.
      key: config.river
      #key: null

    clustering:
      # -- Deploy agents in a cluster to allow for load distribution. Only
      # applies when agent.mode=flow.
      enabled: true

    # -- Path to where Grafana Agent stores data (for example, the Write-Ahead Log).
    # By default, data is lost between reboots.
    storagePath: /tmp/agent

    # -- Address to listen for traffic on. 0.0.0.0 exposes the UI to other
    # containers.
    listenAddr: 0.0.0.0

    # -- Port to listen for traffic on.
    listenPort: 12345

    # -- Scheme is needed for readiness probes. If enabling tls in your configs, set to "HTTPS"
    listenScheme: HTTP

    # --  Base path where the UI is exposed.
    uiPathPrefix: /

    # -- Enables sending Grafana Labs anonymous usage stats to help improve Grafana
    # Agent.
    enableReporting: false

    # -- Extra environment variables to pass to the agent container.
    extraEnv: []

    # -- Maps all the keys on a ConfigMap or Secret as environment variables. https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#envfromsource-v1-core
    envFrom: []

    # -- Extra args to pass to `agent run`: https://grafana.com/docs/agent/latest/flow/reference/cli/run/
    extraArgs: []

    # -- Extra ports to expose on the Agent
    extraPorts:
      - name: "otlpgrpc"
        port: 4317
        targetPort: 4317
        protocol: "TCP"
      - name: "otlphttp"
        port: 4318
        targetPort: 4318
        protocol: "TCP"
      # - name: "flow-port"
      #   port: 12345
      #   targetPort: 12345
      #   protocol: "TCP"
    # - name: "faro"
    #   port: 12347
    #   targetPort: 12347
    #   protocol: "TCP"

    mounts:
      # -- Mount /var/log from the host into the container for log collection.
      varlog: false
      # -- Mount /var/lib/docker/containers from the host into the container for log
      # collection.
      dockercontainers: false

      # -- Extra volume mounts to add into the Grafana Agent container. Does not
      # affect the watch container.
      extra:
        - name: gfagent-tmp
          mountPath: /tmp/agent

    # -- Security context to apply to the Grafana Agent container.
    securityContext:
      readOnlyRootFilesystem: true
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

    # -- Resource requests and limits to apply to the Grafana Agent container.
    resources: {}

  image:
    # -- Grafana Agent image registry (defaults to docker.io)
    registry: "docker.io"
    # -- Grafana Agent image repository.
    repository: grafana/agent
    # -- (string) Grafana Agent image tag. When empty, the Chart's appVersion is
    # used.
    tag: v0.40.3
    # -- Grafana Agent image's SHA256 digest (either in format "sha256:XYZ" or "XYZ"). When set, will override `image.tag`.
    digest: null
    # -- Grafana Agent image pull policy.
    pullPolicy: IfNotPresent
    # -- Optional set of image pull secrets.
    pullSecrets: []

  rbac:
    # -- Whether to create RBAC resources for the agent.
    create: false

  serviceAccount:
    # -- Whether to create a service account for the Grafana Agent deployment.
    create: true
    # -- Additional labels to add to the created service account.
    additionalLabels: {}
    # -- Annotations to add to the created service account.
    annotations: {}
    # -- The name of the existing service account to use when
    # serviceAccount.create is false.
    name: null

  # Options for the extra controller used for config reloading.
  configReloader:
    # -- Enables automatically reloading when the agent config changes.
    enabled: true
    image:
      # -- Config reloader image registry (defaults to docker.io)
      registry: "ghcr.io"
      # -- Repository to get config reloader image from.
      repository: jimmidyson/configmap-reload
      # -- Tag of image to use for config reloading.
      tag: v0.9.0
      # -- SHA256 digest of image to use for config reloading (either in format "sha256:XYZ" or "XYZ"). When set, will override `configReloader.image.tag`
      digest: ""
    # -- Override the args passed to the container.
    customArgs: []
    # -- Resource requests and limits to apply to the config reloader container.
    resources:
      requests:
        cpu: "1m"
        memory: "5Mi"
    # -- Security context to apply to the Grafana configReloader container.
    securityContext:
      readOnlyRootFilesystem: true
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

  controller:
    # -- Type of controller to use for deploying Grafana Agent in the cluster.
    # Must be one of 'daemonset', 'deployment', or 'statefulset'.
    type: 'deployment'

    # -- Number of pods to deploy. Ignored when controller.type is 'daemonset'.
    replicas: 2

    # -- Annotations to add to controller.
    extraAnnotations: {}

    # -- Whether to deploy pods in parallel. Only used when controller.type is
    # 'statefulset'.
    parallelRollout: true

    # -- Configures Pods to use the host network. When set to true, the ports that will be used must be specified.
    hostNetwork: false

    # -- Configures Pods to use the host PID namespace.
    hostPID: false

    # -- Configures the DNS policy for the pod. https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy
    dnsPolicy: ClusterFirst

    # -- Update strategy for updating deployed Pods.
    updateStrategy: {}

    # -- nodeSelector to apply to Grafana Agent pods.
    nodeSelector: {}

    # -- Tolerations to apply to Grafana Agent pods.
    tolerations: []

    # -- Topology Spread Constraints to apply to Grafana Agent pods.
    topologySpreadConstraints: []

    # -- priorityClassName to apply to Grafana Agent pods.
    priorityClassName: ''

    # -- Extra pod annotations to add.
    podAnnotations: {}

    # -- Extra pod labels to add.
    podLabels: {}

    # -- Whether to enable automatic deletion of stale PVCs due to a scale down operation, when controller.type is 'statefulset'.
    enableStatefulSetAutoDeletePVC: false

    autoscaling:
      # -- Creates a HorizontalPodAutoscaler for controller type deployment.
      enabled: false
      # -- The lower limit for the number of replicas to which the autoscaler can scale down.
      minReplicas: 2
      # -- The upper limit for the number of replicas to which the autoscaler can scale up.
      maxReplicas: 5
      # -- Average CPU utilization across all relevant pods, a percentage of the requested value of the resource for the pods. Setting `targetCPUUtilizationPercentage` to 0 will disable CPU scaling.
      targetCPUUtilizationPercentage: 0
      # -- Average Memory utilization across all relevant pods, a percentage of the requested value of the resource for the pods. Setting `targetMemoryUtilizationPercentage` to 0 will disable Memory scaling.
      targetMemoryUtilizationPercentage: 80

      scaleDown:
        # -- List of policies to determine the scale-down behavior.
        policies: []
          # - type: Pods
          #   value: 4
          #   periodSeconds: 60
        # -- Determines which of the provided scaling-down policies to apply if multiple are specified.
        selectPolicy: Max
        # -- The duration that the autoscaling mechanism should look back on to make decisions about scaling down.
        stabilizationWindowSeconds: 300

      scaleUp:
        # -- List of policies to determine the scale-up behavior.
        policies: []
          # - type: Pods
          #   value: 4
          #   periodSeconds: 60
        # -- Determines which of the provided scaling-up policies to apply if multiple are specified.
        selectPolicy: Max
        # -- The duration that the autoscaling mechanism should look back on to make decisions about scaling up.
        stabilizationWindowSeconds: 0

    # -- Affinity configuration for pods.
    affinity: {}

    volumes:
      # -- Extra volumes to add to the Grafana Agent pod.
      extra:
        - name: gfagent-tmp
          emptyDir: {}

    # -- volumeClaimTemplates to add when controller.type is 'statefulset'.
    volumeClaimTemplates: []

    ## -- Additional init containers to run.
    ## ref: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
    ##
    initContainers: []

    # -- Additional containers to run alongside the agent container and initContainers.
    extraContainers: []

  service:
    # -- Creates a Service for the controller's pods.
    enabled: true
    # -- Service type
    type: ClusterIP
    # -- NodePort port. Only takes effect when `service.type: NodePort`
    nodePort: 31128
    # -- Cluster IP, can be set to None, empty "" or an IP address
    clusterIP: ''
    # -- Value for internal traffic policy. 'Cluster' or 'Local'
    internalTrafficPolicy: Cluster
    annotations: {}

  serviceMonitor:
    enabled: false
    # -- Additional labels for the service monitor.
    additionalLabels: {}
    # -- Scrape interval. If not set, the Prometheus default scrape interval is used.
    interval: ""
    # -- MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
    # ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
    metricRelabelings: []
    # - action: keep
    #   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
    #   sourceLabels: [__name__]

    # -- Customize tls parameters for the service monitor
    tlsConfig: {}

    # -- RelabelConfigs to apply to samples before scraping
    # ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
    relabelings: []
    # - sourceLabels: [__meta_kubernetes_pod_node_name]
    #   separator: ;
    #   regex: ^(.*)$
    #   targetLabel: nodename
    #   replacement: $1
    #   action: replace
  ingress:
    # -- Enables ingress for the agent (faro port)
    enabled: false
    # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
    # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
    # ingressClassName: nginx
    # Values can be templated
    annotations:
      {}
      # kubernetes.io/ingress.class: nginx
      # kubernetes.io/tls-acme: "true"
    labels: {}
    path: /
    faroPort: 12347

    # pathType is only for k8s >= 1.1=
    pathType: Prefix

    hosts:
      - chart-example.local
    ## Extra paths to prepend to every host configuration. This is useful when working with annotation based services.
    extraPaths: []
    # - path: /*
    #   backend:
    #     serviceName: ssl-redirect
    #     servicePort: use-annotation
    ## Or for k8s > 1.19
    # - path: /*
    #   pathType: Prefix
    #   backend:
    #     service:
    #       name: ssl-redirect
    #       port:
    #         name: use-annotation

    tls: []
    #  - secretName: chart-example-tls
    #    hosts:
    #      - chart-example.local

I hope this helps.

BR, Patrick

captncraig commented 5 months ago

Is this the policy you are using? https://kyverno.io/policies/best-practices/require-drop-all/require-drop-all/

If I am reading that correctly, it requires a blanket drop all capabilities, but it does not preclude adding specific capabilities back in. So I suspect a solution like this one in #177 may work for you.

PatMis16 commented 5 months ago

@captncraig I suppose so but I have to clarify with our Kubernetes team to be sure.

PatMis16 commented 5 months ago

@captncraig Got feedback from our Kubernetes Team:

All means all and explicitly readding some privileges (even though ALL is dropped) would violate the policy, so the proposed solution isn't feasible. Especially not because the list of capabilities granted there is rather long and contains some capabilities that would make container escapes an easy task.

captncraig commented 5 months ago

I have it running with just NET_BIND_SERVICE added. I think that is because we added that cap to the binary when 80 was the helm default port, and that was needed. I suspect if we remove that from the dockerfile, it may work with all caps dropped.

captncraig commented 5 months ago

@PatMis16 can you try docker.io/grafana/alloy-dev:v1.1.0-devel-0ad55da2c with all capabilities dropped? It runs for me, but not sure about your policies.

pzmi-f3 commented 5 months ago

I am in a similar situation with policies and I can confirm that docker.io/grafana/alloy-dev:v1.1.0-devel-0ad55da2c starts successfully without errors.

PatMis16 commented 5 months ago

@captncraig @pzmi-f3 am going to try it today.

PatMis16 commented 5 months ago

@captncraig @pzmi-f3 There is some delay, I have to add the image repository for alloy-dev to the allowlist first.

PatMis16 commented 5 months ago

I am currently on PTO. Will proceed wit this next week.

benoitschipper commented 4 months ago

I have the same probem (as far as I know) on OpenShift where we are also running with the above mentioned securityContext. I will give it a try coming days and get back to you.

benoitschipper commented 4 months ago

Hey all,

This seems to work now, without any Security Context Constraint shenanigans or special rights for the alloy service account within the namespace. I chose for the Deployment as I am trying to create an 'alloy service' for DevOps Teams within their projects context.

I believe the problem in the 'latest' build I used was the fact that the below was set within the Dockerbuild:

"RUN |2 UID=473 USERNAME=alloy /bin/sh -c setcap 'cap_net_bind_service=+ep' /bin/alloy # buildkit"

Two Questions:

Thanks in advance!


Below more information on my setup in order to (try) and provide complete information.

My Setup

My helm Chart/Kustomize setup (via ArgoCD) to give you an idea of what I setup as parameters:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../yourdir
  - yournamespace.yaml

helmCharts:
- name: alloy
  repo: oci://yourregistry.com/yourteam-helm/grafana
  version: 0.1.1
  releaseName: alloy
  namespace: yournamespace
  valuesInline:
    global:
      image:
        registry: yourregistry.com
    alloy:
      configmap:
        create: true
        content: '' # place where you put the congfig for Alloy
      clustering:
        enabled: true
      enableReporting: false
      resources:
        requests:
          cpu: 250m
          memory: 250Mi
        limits:
          cpu: '2'
          memory: 2Gi
    image:
      repository: yourregistry.com/grafana/alloy-dev
      tag: v1.1.0-devel-0ad55da2c
    configReloader:
      enabled: true
      image:
        repository: yourregistry.com/grafana/alloy/configmap-reload
        tag: v0.12.0
      resources:
        requests:
          cpu: "35m"
          memory: "75Mi"
    controller:
      type: 'deployment'
      autoscaling:
        enabled: true
        minReplicas: 1
        maxReplicas: 5
        targetCPUUtilizationPercentage: 0
        targetMemoryUtilizationPercentage: 80
        scaleDown:
          policies:
            - type: Pods
              value: 4
              periodSeconds: 60
          selectPolicy: Max
          stabilizationWindowSeconds: 300
        scaleUp:
          policies:
            - type: Pods
              value: 4
              periodSeconds: 60
      affinity: {}
    serviceMonitor:
      enabled: true

Logs Alloy container within pod:

ts=2024-05-22T06:51:28.970821394Z level=info "boringcrypto enabled"=false
ts=2024-05-22T06:51:28.971078826Z level=info msg="starting complete graph evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d
ts=2024-05-22T06:51:28.971104683Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d node_id=remotecfg duration=46.494µs
ts=2024-05-22T06:51:28.971138189Z level=info msg="applying non-TLS config to HTTP server" service=http
ts=2024-05-22T06:51:28.971160012Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d node_id=http duration=5.611µs
ts=2024-05-22T06:51:28.971179238Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d node_id=cluster duration=745ns
ts=2024-05-22T06:51:28.971193377Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d node_id=ui duration=349ns
ts=2024-05-22T06:51:28.971205752Z level=info msg="Using pod service account via in-cluster config" component_path=/ component_id=discovery.kubernetes.ingresses
ts=2024-05-22T06:51:28.971217562Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d node_id=discovery.kubernetes.ingresses duration=713.06µs
ts=2024-05-22T06:51:28.971229764Z level=info msg="Using pod service account via in-cluster config" component_path=/ component_id=discovery.kubernetes.endpointslices
ts=2024-05-22T06:51:28.971240885Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d node_id=discovery.kubernetes.endpointslices duration=275.633µs
ts=2024-05-22T06:51:28.971255795Z level=info msg="Using pod service account via in-cluster config" component_path=/ component_id=discovery.kubernetes.endpoints
ts=2024-05-22T06:51:28.971266524Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d node_id=discovery.kubernetes.endpoints duration=358.411µs
ts=2024-05-22T06:51:28.971281305Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d node_id=logging duration=485.654µs
ts=2024-05-22T06:51:28.971306631Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d node_id=labelstore duration=8.666µs
ts=2024-05-22T06:51:28.971324683Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d node_id=otel duration=2.736µs
ts=2024-05-22T06:51:28.971601391Z level=info msg="Using pod service account via in-cluster config" component_path=/ component_id=discovery.kubernetes.nodes
ts=2024-05-22T06:51:28.97177418Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d node_id=discovery.kubernetes.nodes duration=434.571µs
ts=2024-05-22T06:51:28.972077568Z level=info msg="Using pod service account via in-cluster config" component_path=/ component_id=discovery.kubernetes.pods
ts=2024-05-22T06:51:28.97223953Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d node_id=discovery.kubernetes.pods duration=429.761µs
ts=2024-05-22T06:51:28.972289852Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d node_id=tracing duration=15.736µs
ts=2024-05-22T06:51:28.972514212Z level=info msg="Using pod service account via in-cluster config" component_path=/ component_id=discovery.kubernetes.services
ts=2024-05-22T06:51:28.972893335Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d node_id=discovery.kubernetes.services duration=575.432µs
ts=2024-05-22T06:51:28.972941858Z level=info msg="finished complete graph evaluation" controller_path=/ controller_id="" trace_id=41d35fd36a5c1cd993e391ab9aae4e5d duration=3.668416ms
ts=2024-05-22T06:51:28.973091676Z level=info msg="scheduling loaded components and services"
ts=2024-05-22T06:51:28.973912949Z level=info msg="now listening for http traffic" service=http addr=0.0.0.0:12345
ts=2024-05-22T06:51:28.983665952Z level=info msg="starting cluster node" peers="" advertise_addr=10.129.5.159:12345
ts=2024-05-22T06:51:28.983839834Z level=info msg="peers changed" new_peers=alloy-78677458f5-kkgjl
ts=2024-05-22T06:52:28.986882891Z level=info msg="rejoining peers" peers=10-129-5-159.alloy-cluster.yournamespace.svc.cluster.local.:12345

Logs Config-Reloader container within the pod:

2024/05/22 06:51:29 Watching directory: "/etc/alloy"
debovema commented 4 months ago

The v1.1.0 is already released and fixed the issue on OpenShift for me (using the anyuid SCC to set the right UID).

benoitschipper commented 4 months ago

Great! @debovema Thanks for the headsup!

benoitschipper commented 4 months ago

Confirmed

Works as of v1.1.0

PatMis16 commented 4 months ago

@benoitschipper @debovema @pzmi-f3 @captncraig Hi all, Sorry for the late update. Confirmed! It works with v1.1.0.