jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.49k stars 2.44k forks source link

[Bug]: Default badger path is no longer writtable #4906

Closed freef4ll closed 1 year ago

freef4ll commented 1 year ago

What happened?

With the release of 1.50, documented badger default path is no longer writable as it is owned by root after the changes in https://github.com/jaegertracing/jaeger/pull/4783

Steps to reproduce

docker run \
  -e SPAN_STORAGE_TYPE=badger \
  -e BADGER_EPHEMERAL=false \
  -e BADGER_DIRECTORY_VALUE=/badger/data \
  -e BADGER_DIRECTORY_KEY=/badger/key \
  -v /tmp/badger:/badger   -p 16686:16686   jaegertracing/all-in-one:1.50

{"level":"fatal","ts":1698680105.917989,"caller":"./main.go:110","msg":"Failed to init storage factory","error":"Error Creating Dir: \"/badger/key\" error: mkdir /badger/key: permission denied","stacktrace":"main.main.func1\n\t./main.go:110\ngithub.com/spf13/cobra.(Command).execute\n\tgithub.com/spf13/cobra@v1.7.0/command.go:940\ngithub.com/spf13/cobra.(Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.7.0/command.go:1068\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.7.0/command.go:992\nmain.main\n\t./main.go:243\nruntime.main\n\truntime/proc.go:267"}

Expected behavior

Jaeger to start up

Relevant log output

No response

Screenshot

No response

Additional context

No response

Jaeger backend version

1.50

SDK

No response

Pipeline

No response

Stogage backend

No response

Operating system

No response

Deployment model

No response

Deployment configs

No response

yurishkuro commented 1 year ago

Makes sense, but I don't think we're going to fix that. You have a manual workaround of changing permissions on the existing files to non-root user (see https://github.com/jaegertracing/jaeger/issues/4906#issuecomment-1849559579).

To play devil's advocate, how would you expect Jaeger to handle that? Changing to non-root user was the right choice from security perspective.

The other thing is we should call this out in the release notes as a breaking change and suggest remediation steps, like running chown - would you like to submit a PR for that?

miparnisari commented 1 year ago

The other thing is we should call this out in the release notes as a breaking change and suggest remediation steps, like running chown

Yes please. This is a breaking change, it should be in https://github.com/jaegertracing/jaeger/releases/tag/v1.50.0. I would submit a PR but i don't know what command i should run.

freef4ll commented 1 year ago

I think if the directory is added into the container and have permissions created for it like this:

FROM alpine
ARG USER_UID=10001
RUN mkdir /badger && chown ${USER_UID} /badger
USER ${USER_UID}
CMD ["/bin/ls", "-l", "/badger'"]

Then it works fine:

$ docker run -v badger-storage:/badger -ti test /bin/sh
~ $ cd /badger
/badger $ touch a
/badger $ ls -l
total 0
-rw-r--r--    1 10001    root             0 Oct 31 09:10 a

Minimal PR created https://github.com/jaegertracing/jaeger/pull/4910; if you're happy with this then can update all other Dockerfiles.

yurishkuro commented 1 year ago

updated changelog and release notes

yurishkuro commented 1 year ago

thanks for reporting!

miparnisari commented 1 year ago

I'm not a Docker expert, how would I prevent this problem from occurring with my existing Docker-compose file?

services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    container_name: jaeger-demo
    command: [ "--query.max-clock-skew-adjustment", "500ms" ]
    environment:
      - COLLECTOR_OTLP_ENABLED=true
      - SPAN_STORAGE_TYPE=badger
      - BADGER_EPHEMERAL=false
      - BADGER_DIRECTORY_VALUE=/badger/data
      - BADGER_DIRECTORY_KEY=/badger/key
    volumes:
      - jaeger_data:/badger
    ports:
      - "16686:16686" # UI
    networks:
      - default
volumes:
  jaeger_data:
t-pohl commented 11 months ago

@miparnisari Have you found a way to fix this directly in the docker-compose file? I'm facing the same problem.

miparnisari commented 11 months ago

@t-pohl, no :( I just pinned the version of Jaeger.

t-pohl commented 11 months ago

@miparnisari I managed to upgrade using the following instructions:

Since version 1.50.0 of Jaeger (see https://github.com/jaegertracing/jaeger/releases/tag/v1.50.0) there is a breaking change which requires the external volume to be owned by a different user than root (which is the docker default). The user id is 10001. This requires to change the ownership of the volume folder on the host to the user 10001: Example: Find out the location of the jaeger volume on the host using docker volume inspect

docker volume ls
# Find the name of the volume
docker volume inspect traefik_jaeger_data

Start a root shell:

sudo bash

Change the volume ownwership:

cd /var/lib/docker/volumes/traefik_jaeger_data/ # Change to directory of volume on host
 chown -R 10001:10001 _data/

Afterwards start jaeger and check the logs.

Maybe this can help you too.

de-perotti commented 10 months ago

Turns out you can set user: root in docker compose to get it permissions working without having to chown every time with an external script. I'm pretty sure that's not the best practice but for local development, that's dandy

yurishkuro commented 9 months ago

I just tried an experiment, on MacOS with Docker Desktop (results could be different on different OS)

With version before change

$ docker run \
  -e SPAN_STORAGE_TYPE=badger \
  -e BADGER_EPHEMERAL=false \
  -e BADGER_DIRECTORY_VALUE=/badger/data \
  -e BADGER_DIRECTORY_KEY=/badger/key \
  -v $HOME/dev/badger-storage:/badger \
  -p 16686:16686 \
  jaegertracing/all-in-one:1.49

$ docker ps
CONTAINER ID   IMAGE                           COMMAND                  CREATED          STATUS          PORTS                                                                               NAMES
30eb40642743   jaegertracing/all-in-one:1.49   "/go/bin/all-in-one-…"   17 seconds ago   Up 16 seconds   5775/udp, 5778/tcp, 14250/tcp, 6831-6832/udp, 14268/tcp, 0.0.0.0:16686->16686/tcp   vigilant_brown

$ docker exec -it 30eb40642743 /bin/sh
/ # ps
PID   USER     TIME  COMMAND
    1 root      0:01 /go/bin/all-in-one-linux
   16 root      0:00 /bin/sh
   22 root      0:00 ps

/ # ls -lF /badger
total 0
drwx------    5 root     root           160 Feb 18 22:49 data/
drwx------    6 root     root           192 Feb 18 22:49 key/

/ # ls -lF /badger/data
total 1044
-rw-r--r--    1 root     root     2147483646 Feb 18 22:49 000001.vlog
-rw-r--r--    1 root     root       1048576 Feb 18 22:49 DISCARD
-rw-r--r--    1 root     root             2 Feb 18 22:49 LOCK

Here the container is running as root and files are owned by root

With newer version after UID change

$ docker run \
  -e SPAN_STORAGE_TYPE=badger \
  -e BADGER_EPHEMERAL=false \
  -e BADGER_DIRECTORY_VALUE=/badger/data \
  -e BADGER_DIRECTORY_KEY=/badger/key \
  -v $HOME/dev/badger-storage:/badger \
  -p 16686:16686 \
  jaegertracing/all-in-one:1.53

$ docker ps
CONTAINER ID   IMAGE                           COMMAND                  CREATED         STATUS         PORTS                                                                                                        NAMES
422d701c2cc5   jaegertracing/all-in-one:1.53   "/go/bin/all-in-one-…"   3 seconds ago   Up 2 seconds   4317-4318/tcp, 5775/udp, 5778/tcp, 9411/tcp, 14250/tcp, 14268/tcp, 6831-6832/udp, 0.0.0.0:16686->16686/tcp   crazy_leakey

$ docker exec -it 422d701c2cc5 /bin/sh
/ $ ps
PID   USER     TIME  COMMAND
    1 10001     0:02 /go/bin/all-in-one-linux
   16 10001     0:00 /bin/sh
   22 10001     0:00 ps

/ $ ls -lF /badger
total 0
drwx------    6 10001    root           192 Feb 18 22:51 data/
drwx------    7 10001    root           224 Feb 18 22:51 key/

/ $ ls -lF /badger/data
total 1048
-rw-r--r--    1 10001    root            20 Feb 18 22:51 000001.vlog
-rw-r--r--    1 10001    root     2147483646 Feb 18 22:51 000002.vlog
-rw-r--r--    1 10001    root       1048576 Feb 18 22:49 DISCARD
-rw-r--r--    1 10001    root             2 Feb 18 22:51 LOCK

/ $ whoami
whoami: unknown uid 10001

/ $ touch /badger/data/test-file
/ $ ls -lF /badger/data/test-file
-rw-r--r--    1 10001    root             0 Feb 18 22:53 /badger/data/test-file

Here the Jaeger process is running as 10001 and files are owned by 10001 (I am not sure how the latter happens). The directory is writable by that UID.

ivanallen commented 8 months ago

@yurishkuro

I enter the docker container(on rocky linux):

docker run -it  -e SPAN_STORAGE_TYPE=badger   -e BADGER_EPHEMERAL=false   -e BADGER_DIRECTORY_VALUE=/badger/data   -e BADGER_DIRECTORY_KEY=/badger/key   -v /mnt/badger:/badger   -p 16686:16686 --entrypoint ""  jaegertracing/all-in-one:1.54 /bin/sh

all files' owned by root:root. but whoami show 10001.

image

yurishkuro commented 8 months ago

@ivanallen did you try to chown it to the new user?

ivanallen commented 8 months ago

@yurishkuro

No, I didn't do anything. I executed the command in a completely new environment (rocky Linux). I tried it on my macOS and it was working fine. The problem may depend on the environment.

I temporarily rolled back to jaegertracing/all-in-one:1.49, which works fine in rocky Linux.

MarkusGeigerDev commented 8 months ago

While you can chown existing dirs from earlier installations, how would you do that if you start from scratch? I may be missing something, but isn't that some kind of a hen-and-egg problem? Shouldn't there be a /bagder/data directory which is already owned by 10001 in the container image from the beginning?

yurishkuro commented 8 months ago

@MarkusGeigerDev if you are starting fresh without previous data there should be no issues at all. If you are still getting permission issues it's likely the target directory does not allow writes by the container irrespective of which user id is used inside the container.

Please provide reproducible example if you think otherwise, ie which OS, which containers runtime, which user it runs as, container command with volume mounting, and existing permissions on the parent directory on the host.

MarkusGeigerDev commented 8 months ago

@yurishkuro Thanks for getting back to me! This is the relevant snippet from my docker-compose (running in WSL2/Windows)

  jaeger:
    image: jaegertracing/all-in-one:latest
    command: 
      - "--badger.ephemeral=false"
      - "--badger.directory-key=/badger/keys"
      - "--badger.directory-value=/badger/values"
      - "--badger.span-store-ttl=72h0m0s" # limit storage to 72hrs
    environment:
      - SPAN_STORAGE_TYPE=badger
    volumes:
      - jaeger_badger_data:/badger
    ports:
      - "16686:16686"
      - "14250"
      - "4317"
 #   user: root

volumes:
  jaeger_badger_data:

The problem seems to be that the user 10001 can't even create the internal directories /badger/keys and /badger/values on startup:

Failed to init storage factory","error":"Error Creating Dir: \"/badger/keys\" error: mkdir /badger/keys: permission denied

When I set --badger.ephemeral=true the service starts up (obviously) and I am able to docker exec -it into it.
I looked around if there is any data path owned by 10001 already set up but I neither found one, nor was I able to create one.

When I remove the #-comment from the user: root line, everything works fine - but of course that was only to prove my point, I really don't want to revert the otherwise good decision of running jaeger as non-root.

yurishkuro commented 8 months ago

Did you create jaeger_badger_data before running? What permissions does it have?

I would recommend running a shell inside the container and trying to create files (perhaps using different users), to see what's causing the permissions issue.

NB: this could be Windows specific.

# user: root

What if you try some other user that is palatable to Windows? The default 10001 in Jaeger Dockerfile is just "a suggestion", you don't have to run with it. But it would be nice to get to the bottom of this and add documentation.

MarkusGeigerDev commented 8 months ago

@yurishkuro: I think I found a solution:

I added another service to my docker-compose that sets up everything before jaeger starts up:

version: "3.9"

services:
[...]
  jaeger:
    image: jaegertracing/all-in-one:latest
    command: 
      - "--badger.ephemeral=false"
      - "--badger.directory-key=/badger/data/keys"
      - "--badger.directory-value=/badger/data/values"
      - "--badger.span-store-ttl=72h0m0s" # limit storage to 72hrs
    environment:
      - SPAN_STORAGE_TYPE=badger
    volumes:
      - jaeger_badger_data:/badger
    ports:
      - "16686:16686"
      - "14250"
      - "4317"
    depends_on:
      init-jaeger:
        condition: service_completed_successfully

  init-jaeger:
    user: root
    image: busybox
    command: "/bin/sh -c 'mkdir -p /badger/data && touch /badger/data/.initialized && chown -R 10001:10001 /badger/data'"
    volumes:
      - jaeger_badger_data:/badger

volumes:
  jaeger_badger_data:

The init-jaeger service still runs as root (to be able to chown something), but since it will be already terminated when jaeger starts up, it shouldn't add anything to the attack surface of the system. If you like the solution, feel free to add it to the docs. Perhaps you may want to exchange busybox for whatever operating system base layer jaeger is using to avoid the additional download.(reuse of the already pulled layer), but on the other hand, busybox isn't that big either.

yurishkuro commented 8 months ago

@MarkusGeigerDev thanks! I suspect you can use the jaeger image itself, it has shell in it. Also, did you try to create just the /badger dir? I think it should be sufficient and less error prone (your solution seems to rely on some understanding of internal organization of badger sub-dirs)

MarkusGeigerDev commented 8 months ago

@yurishkuro

I suspect you can use the jaeger image itself, it has shell in it.

Haha, you're right, this works as well :)

  init-jaeger:
    user: root
    image: jaegertracing/all-in-one:latest
    entrypoint: "/bin/sh -c 'mkdir -p /badger/data && touch /badger/data/.initialized && chown -R 10001:10001 /badger/data'"
    volumes:
      - jaeger_badger_data:/badger

Also, did you try to create just the /badger dir? I think it should be sufficient and less error prone (your solution seems to rely on some understanding of internal organization of badger sub-dirs)

/badger is really just the mount point. It corresponds to the root dir in the volume, which I didn't want to change. I have no further knowledge about badger, besides what I took from the jaeger CLI docs which seem to suggest that badger needs two directories, one for the keys and one for the values. Since I was too lazy to chown those two individually, I introduced /data/ as a common root for both, keys and values, and used the --badger.directory-key and --badger.directory-value CLI switches to push badger that one level down.

EDIT:

Oh, BTW, I totally forgot to answer your other suggestions - sorry for that:

What if you try some other user that is palatable to Windows? The default 10001 in Jaeger Dockerfile is just "a suggestion", you don't have to run with it. But it would be nice to get to the bottom of this and add documentation.

You cannot easily map users from a Linux container to Windows users. Windows uses a very different technical scheme for users and groups called SIDs. For example, the "root" group ("Administrators", actually) is S-1-5-32-544. But in the end, all that doesn't matter, since we use the Linux dockerd anyways. All containers physically run in the Windows Subsystem for Linux, on a real Linux kernel and with an EXT4 file system. That's why in my solution the chown -R 10001:10001 works.

yurishkuro commented 8 months ago

Documented in #5282

genfemme commented 6 months ago

@MarkusGeigerDev @yurishkuro I am getting the same error when doing the install using helm.

This is my values.yaml file -

query:
  initContainers:
    name: init-jaeger
    image: jaegertracing/all-in-one:latest
    command: ["/bin/sh", "-c"]
    args: ["mkdir -p /badger/data && touch /badger/data/.initialized && chown -R 10001:10001 /badger/data"]
    volumeMounts:
      - name: jaeger-badger-data
        mountPath: /badger
    securityContext:
      fsGroup: 10001
volumes:
  - name: jaeger-badger-data
    emptyDir: {}`

helm chart - https://raw.githubusercontent.com/hansehe/jaeger-all-in-one/master/helm/charts version - 0.1.12

I have tried using the root user as well and nothing changed.

Error details -

{"level":"fatal","ts":1715098835.9268656,"caller":"all-in-one/main.go:114","msg":"Failed to init storage factory","error":"Error Creating Dir: \"/badger/key\" error: mkdir /badger/key: permission denied","stacktrace":"main.main.func1\n\tgithub.com/jaegertracing/jaeger/cmd/all-in-one/main.go:114\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\tgithub.com/jaegertracing/jaeger/cmd/all-in-one/main.go:249\nruntime.main\n\truntime/proc.go:271"}`

What is the issue here?