grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.8k stars 3.43k forks source link

Error installing Loki with persistent volume with a user without 'root' privileges #2018

Closed mkenne11 closed 4 years ago

mkenne11 commented 4 years ago

Describe the bug

When installing Loki using the helm chart (loki/loki) the loki pod is crashing (with the "CrashLoopBackOff" status) and the logs for the pod are:

level=info ts=2020-04-27T11:49:54.44769852Z caller=loki.go:156 msg=initialising module=server
level=info ts=2020-04-27T11:49:54.448582974Z caller=server.go:147 http=[::]:3100 grpc=[::]:9095 msg="server listening on addresses"
level=info ts=2020-04-27T11:49:54.449088913Z caller=loki.go:156 msg=initialising module=runtime-config
level=info ts=2020-04-27T11:49:54.449385151Z caller=manager.go:109 msg="runtime config disabled: file not specified"
level=info ts=2020-04-27T11:49:54.449489039Z caller=loki.go:156 msg=initialising module=memberlist-kv
level=info ts=2020-04-27T11:49:54.449580612Z caller=loki.go:156 msg=initialising module=table-manager
level=error ts=2020-04-27T11:49:54.449902146Z caller=log.go:141 msg="error initializing bucket client" err="mkdir /data/loki: permission denied"

The storage volume (.data folder) is mapped to the persistant volume claim (pvc) pvc-loki, using the persistence.existingClaim value in the chart.

The loki-values.yaml file contains:

image:
  repository: grafana/loki
  tag: 1.4.1
persistence:
  enabled: true
  accessModes:
  - ReadWriteOnce
  size: 10Gi
  existingClaim: pvc-loki
  mountPath: "/data"
resources: {}
securityContext:
  fsGroup: 1000
  runAsGroup: 1000
  runAsNonRoot: true
  runAsUser: 1000

The pvc and persistent volume were created using the following resource definition file:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-loki
  labels:
    name: loki-storage
spec:
  storageClassName: local-path
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
     path: "/mnt/cluster_share/kube/data/loki"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-loki
  namespace: monitoring
spec:
  storageClassName: local-path
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  selector:
    matchLabels:
      name: loki-storage

The loki service account (with uid 1000 and gid 1000) has been made the owner of the folder mounted by the pvc using:

sudo chown -R 1000:1000 /mnt/cluster_share/kube/data/loki

Note. This appears to be related to #1834.

To Reproduce

Loki was installed using the chart command line and custom values file (loki-values.yaml):

helm upgrade --install monitoring-loki --namespace monitoring -f loki-values.yaml loki/loki

Expected behavior

The Loki deployment and service should start (with a status of Running).

Environment

I'm using the k3s distribution of Kubernetes v1.17.4 running on a Raspbian cluster.

Helm version 3.2.0 is used for the installation.

Screenshots, Promtail config, or terminal output See details above for definition files.

mkenne11 commented 4 years ago

I tried installing the helm chart using a Loki configuration running the container as the root user (with a uid of 0 and gid and 0) and installed and ran without any issues. The working loki-values.yaml file is below:

image:
  repository: grafana/loki
  tag: 1.4.1
persistence:
  enabled: true
  accessModes:
  - ReadWriteOnce
  size: 10Gi
  existingClaim: pvc-loki
resources: {}
securityContext:
  runAsGroup: 0
  runAsNonRoot: false
  runAsUser: 0

Ideally I'd rather not run as a root user for security reasons.

adityacs commented 4 years ago

@mkenne11 1.4.1 is missing loki user in the docker image. So, there is no user with uid 1000. Use the latest image and change your securityContext settings to match uid 10001

FYR: https://github.com/grafana/loki/blob/master/cmd/loki/Dockerfile#L18

adityacs commented 4 years ago

@slim-bean Not sure why this change was not picked in both releases 1.4 and 1.4.1

Non-root user docker image for Loki

slim-bean commented 4 years ago

1.4.0 was cut intentionally before the changes to make Loki run as non-root because we hadn't totally tested that and weren't comfortable putting it in the release.

I'm afraid for 1.4.x you will still need to run as root

mkenne11 commented 4 years ago

Thanks for the assistance @adityacs and @slim-bean. I'll look out for updated releases for Loki running as a non-root user.

@slim-bean - I noticed the pod security policy in the Loki helm chart has readOnlyRootFilesystem set as true: https://github.com/grafana/loki/blob/664537e152ce6e46c00d0941fcd7163ea5f04366/production/helm/loki/templates/podsecuritypolicy.yaml#L37

Would that limit access to the (root) file system as read-only when I run the Loki pod under the root user?

adityacs commented 4 years ago

@mkenne11 readOnlyRootFilesystem: true will make the root file system readonly for all users. You can make it false if you are using 1.4.1. Since, loki directory is /loki

mkenne11 commented 4 years ago

Thanks @adityacs I'll try that setting.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

geekq commented 3 years ago

Still experiencing the same problem mkdir /data/loki: permission denied with the default setup in helm chart plus

   persistence:
     enabled: true

Using the newest helm chart and loki version - Image: grafana/loki:2.2.0

Workaround by @mkenne11 (runAsNonRoot: false etc.) helped, but could be a problem security-wise

HaveFun83 commented 3 years ago

/reopen same here

vast0906 commented 2 years ago

workaround , but could be a problem security-wise

    fsGroup: 0
    runAsNonRoot: false
    runAsUser: 0
systemcrash commented 2 years ago

Just got bit - upgraded from Loki 2 -> 2.4.1:

ran chown 10001.10001 lokidata/ (maps to /loki)

msg="error running loki" err="mkdir wal: permission denied

Edit: "fix" seems to be(?)

add to yaml

...
ingester:
  wal:
    enabled: true
    dir: /loki/wal
...

maybe disabled works too.

mvadu commented 2 years ago

@systemcrash your suggestion of chown 10001.10001 actually solved my issue on arm64. Thank you. I had an empty folder /loki/data as target, and the container created all the subfolders on next restart.

andrewlow commented 2 years ago

I got burned by the uplift to Loki version 2.4.1

The only magic I needed to get working again was to add. I'm still running as user 1000:1000 (and was previously) ... ingester: wal: enabled: true dir: /loki/wal ...

MightySlaytanic commented 2 years ago

Just got bit - upgraded from Loki 2 -> 2.4.1:

ran chown 10001.10001 lokidata/ (maps to /loki)

msg="error running loki" err="mkdir wal: permission denied

Edit: "fix" seems to be(?)

add to yaml

...
ingester:
  wal:
    enabled: true
    dir: /loki/wal
...

maybe disabled works too.

This solved my problem too on my synology NAS where I'm running loki. Thanks. There were no problems up to some weeks ago (grafana/loki:main-c4562f1 was working fine) while with today latest version (whose tag seems two weeks old) there is the problem and I've solved with the above update to the ingested section in config file.

tablatronix commented 2 years ago

How do you fix this in docker? Is this trying to create in /tmp ? NO idea what wal even is..

MightySlaytanic commented 2 years ago

How do you fix this in docker? Is this trying to create in /tmp ? NO idea what wal even is..

Hi, WAL is explained here. BTW, I've added the ingester/wal configuration above as suggested by others in this thread in the configuration file local_config.yaml of Loki that I have on local storage because I bind a local folder to the /etc/loki container's folder. With the wal configuration above it creates a wal folder within the /loki/wal folder of the container (I bind another local folder to the /loki folder of the container)

clintmod commented 2 years ago

for the next guy that runs across this you can do:

loki:
  initContainers:
  - name: fix-permissions
    image: busybox:latest
    securityContext:
      privileged: true
      runAsGroup: 0
      runAsNonRoot: false
      runAsUser: 0
    command:
    - sh
    - -c
    - >-
      id;
      ls -la /data/;
      mkdir -p /data/global/loki;
      chown 10001:10001 /data/global/loki
    volumeMounts:
    - mountPath: /data
      name: storage
r4z3c commented 2 years ago

the @clintmod woraround worked like a charm for me. the only change i made was on the path (removing "global"):

...
mkdir -p /data/loki;
chown 10001:10001 /data/loki
...

tks!

Uddipaan-Hazarika commented 1 year ago

Kudos, @clintmod @r4z3c I solved it in loki v2.8.4 using the following:

singleBinary:
 replicas: 1
 initContainers:
  - name: fix-permissions
    image: busybox:latest
    securityContext:
      privileged: true
      runAsGroup: 0
      runAsNonRoot: false
      runAsUser: 0
    command:
    - sh
    - -c
    - >-
      id;
      ls -la /data/;
      mkdir -p /data/loki;
      chown 10001:10001 /data/loki
    volumeMounts:
    - mountPath: /data
      name: storage

I nested it under single binary. Putting it here if in case someone needs it.

alexey-sh commented 11 months ago

My error was

level=info ts=2023-11-13T09:04:24.819999855Z caller=reporter.go:127 msg="failed to CAS cluster seed key" err="open /loki/chunks/loki_cluster_seed.json: permission denied"

for the following docker-compose.yml

version: '3.9'

  services:
      loki:
    image: grafana/loki:2.9.2
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/local-config.yaml
    volumes:
      - loki_data_rules:/loki/rules
      - loki_data_chunks:/loki/chunks
      - type: bind
        source: ./local-config.yaml
        target: /etc/loki/local-config.yaml
    networks:
      - mynetworknamegoeshere

I did

chown -R 10001:10001 /var/lib/docker/volumes/monitoring_loki_data_chunks/*
chown -R 10001:10001 /var/lib/docker/volumes/monitoring_loki_data_rules/*

and the error is gone

Perhaps the config and mount directories are not configured well but at least it works.

vladsf commented 7 months ago

I solved it in loki chart v5.47.2 using the following:

  singleBinary:
    replicas: 1
    initContainers:
    - name: fix-permissions
      image: busybox:latest
      securityContext:
        privileged: true
        runAsGroup: 0
        runAsNonRoot: false
        runAsUser: 0
      command:
      - sh
      - -c
      - >-
        id;
        ls -la /data/;
        chgrp 10001 /data /data/lost+found;
        chmod g+rwx /data /data/lost+found;
        chmod g+s /data /data/lost+found;
        ls -la /data/
      volumeMounts:
      - mountPath: /data
        name: storage
grapemix commented 4 months ago

We are using helm chart v6.6.4 and we still see this problem when we set the writer's replicas > 1. The log shows the permission denied of /var/loki, not /data in our case. So we end up resolve the problem w/ the following helm values:

    write:
      initContainers:
      - name: fix-permissions
        image: busybox:latest
        securityContext:
          privileged: true
          runAsGroup: 0
          runAsNonRoot: false
          runAsUser: 0
        command:
        - sh
        - -c
        - >-
          id;
          chown 10001:10001 /var/loki
        volumeMounts:
        - mountPath: /var/loki
          name: data