openzipkin / zipkin-dependencies

Spark job that aggregates zipkin spans for use in the UI
Apache License 2.0
176 stars 81 forks source link

crond jobs don't seem to be triggering in k8s deployment #233

Open eli-gc opened 1 month ago

eli-gc commented 1 month ago

Describe the Bug

I have deployed zipkin-dependencies v3.2.0 on my k8s cluster but I can't get the crond job to work. I can see the crond process running when I do top and if I manually run the script in /etc/periodic/hourly, it works fine and I can see data in Elasticsearch and the Zipkin UI. So I know it's not a connection issue. No container logs so it seems to me that it isn't running the job. I looked at #192, but no avail. Seems like the issues in that are already fixed so I think mine is different. I run it as the zipkin-dependencies user not root. I am using elasticsearch as the storage backend.

Steps to Reproduce

Deploy image to k8s using deployment.yaml. Use elasticsearch backend. Run as user 1000.

Expected Behaviour

The crond job should run and I should see dependencies data in Zipkin UI

eli-gc commented 1 month ago

I should mention that I am using a read-only root filesystem, but I have a writable volume mounted at /tmp

eli-gc commented 1 month ago

I noticed that there is no crontab entry for the zipkin-dependencies user at /var/spool/cron/crontabs Only a root file. I am running the container as non root, could this be why?

codefromthecrypt commented 1 month ago

probably the best way to move this forward is to test it, try to reproduce with docker, as this repo doesn't have a helm chart.

e.g. https://github.com/openzipkin/zipkin-dependencies/tree/master/docker#cron

codefromthecrypt commented 1 month ago

fyi https://github.com/openzipkin/zipkin-helm/pull/11 is the currently dormant effort to make this a helm chart

eli-gc commented 1 month ago

I tried adding a crontab for the zipkin-dependencies user in a custom image. While this got the crond to start triggering, I ran into permission issues with the zipkin-dependencies user. It seems like crond calls cron using root. I tested as the root user and it worked. Unfortunately, our security policy prevents us from running containers as root so I ended up having to implement as a k8s cronJob.

codefromthecrypt commented 1 month ago

So, I missed the major detail on #192 which is that the solution was to change to running as root. I think basically this issue should be renamed to "don't require root for scheduled runs of the dependency job".

Maybe someone can then try dcron or similar and raise a PR to change it. https://github.com/gliderlabs/docker-alpine/issues/381

codefromthecrypt commented 1 month ago

sorry maybe supercronic as that's currently maintained. Basically something that does scheduling as non-root. After this is in, someone can also look at k8s native scheduling in https://github.com/openzipkin/zipkin-helm/pull/11, but I think non-root should come first, to reduce guesswork.