Monolith deployment with local backed without persistence fails on OpenShift

pavolloffay commented 2 years ago

Describe the bug

Tempo monolith deployment with local backend without persistence fails on OpenShift - deployed with helm chart.

level=info ts=2022-08-18T12:53:22.827148497Z caller=main.go:191 msg="initialising OpenTracing tracer"
level=info ts=2022-08-18T12:53:22.849156013Z caller=main.go:106 msg="Starting Tempo" version="(version=, branch=main, revision=a8ac8066)"
level=error ts=2022-08-18T12:53:22.850317036Z caller=main.go:109 msg="error running Tempo" err="failed to init module services error initialising module: store: failed to create store mkdir /var/tempo: permission denied"

Helm chart: https://github.com/grafana/helm-charts/tree/main/charts/tempo

tempo:
  storage:
    trace:
      # refer https://github.com/grafana/tempo/tree/master/docs/tempo/website/configuration
      backend: local
      local:
        # this is default value
        path: /var/tempo/traces
      wal:
        path: /var/tempo/wal

The above is a default configuration to deploy Tempo with local backend and disabled persistence (the persistence is disabled by default persistence.enabled: false).

I believe the root cause is this https://github.com/grafana/tempo/issues/491#issuecomment-770024047

By default, OpenShift Container Platform runs containers using an arbitrarily assigned user ID. This provides additional security against processes escaping the container due to a container engine vulnerability and thereby achieving escalated permissions on the host node. For an image to support running as an arbitrary user, directories and files that may be written to by processes in the image should be owned by the root group and be read/writable by that group. Files to be executed should also have group execute permissions.

There are a couple of workarounds:

define an additional empty dir volume and mount it to /var/tempo
use a different directory where access is given e.g. /tmp/tempo
make sure the default directory has the correct permissions:

diff --git a/cmd/tempo/Dockerfile b/cmd/tempo/Dockerfile
index 4e089fd8..100c8c37 100644
--- a/cmd/tempo/Dockerfile
+++ b/cmd/tempo/Dockerfile
@@ -1,5 +1,9 @@
 RUN apk --update add ca-certificates
 ARG TARGETARCH
 COPY bin/linux/tempo-${TARGETARCH} /tempo
+
+RUN mkdir -p /var/tempo
+RUN chgrp -R 0 /var/tempo && chmod -R g+rwX /var/tempo
+
 ENTRYPOINT ["/tempo"]

To Reproduce Steps to reproduce the behavior:

Start Tempo (SHA or version)
Perform Operations (Read/Write/Others)

Expected behavior The deployment with default configuration should work.

Environment:

Infrastructure: Kubernetes - OpenSHift
Deployment tool: helm

zalegrala commented 2 years ago

Thanks for the report, and the PR for the issue. Can you tell me why the fix works? If I understand the issue correctly, it looks like the directory would still be owned by root and so the process would be running as non-root and unable to create additional paths if necessary. Also does the change in ownership continue to work for non-openshift environments?

pavolloffay commented 2 years ago

I think this might be the root cause of the issue I had on OpenShift

By default, OpenShift Container Platform runs containers using an arbitrarily assigned user ID. This provides additional security against processes escaping the container due to a container engine vulnerability and thereby achieving escalated permissions on the host node. For an image to support running as an arbitrary user, directories and files that may be written to by processes in the image should be owned by the root group and be read/writable by that group. Files to be executed should also have group execute permissions.

https://docs.openshift.com/container-platform/4.2/openshift_images/create-images.html#images-create-guide-openshift_create-images

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.

SDAChess commented 6 months ago

This is still an active issue, encountered it today

SDAChess commented 6 months ago

For the people looking for a quick workaround for local deployment with docker compose example, mounting /tmp instead of /tmp/tempo solves the issue.

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.

aslanpour commented 3 months ago

For the people looking for a quick workaround for local deployment with docker compose example, mounting /tmp instead of /tmp/tempo solves the issue.

It worked for me. Can someone tell me why this is the solution?

jonatan-ivanov commented 2 months ago

@joe-elliott Could you please reopen this and add keepalive so that we don't need to ping-pong with the GH bot?

jonatan-ivanov commented 2 months ago

For me this worked till I tried to upgrade from 2.4.1 to 2.5.0. I mounted /tmp/tempo like this:

services
    tempo:
        container_name: tempo
        image: grafana/tempo:2.5.0
        extra_hosts: ['host.docker.internal:host-gateway']
        command: ['-config.file=/etc/tempo.yml']
        volumes:
            - tempo:/tmp/tempo
            - ./docker/grafana/tempo.yml:/etc/tempo.yml:ro
        ports: ...
volumes:
    tempo:
        driver: local

And if I upgrade to 2.5.0, I get this:

tempo  | level=error ts=2024-08-21T02:02:16.113849139Z caller=main.go:121 msg="error running Tempo" err="failed to init module services: error initialising module: store: failed to create store: mkdir /tmp/tempo/blocks: permission denied"

The workaround above (mounting /tmp) worked for me but it seems the official solution is starting a container just to chown Tempo's folder and then start Tempo for "real" which seems quite "hacky" to me: https://github.com/grafana/tempo/blob/a21001a72a5865bfcfc1b0d2dfa30160c5a26103/example/docker-compose/local/docker-compose.yaml#L3-L29

This is what I ended-up doing: https://github.com/jonatan-ivanov/teahouse/commit/2ec33fd88d585daf395ebaa3870b5e02cf3a65ee

grafana / tempo

Monolith deployment with local backed without persistence fails on OpenShift #1657