joyrex2001 / kubedock

Kubedock is a minimal implementation of the docker api that will orchestrate containers on a Kubernetes cluster, rather than running containers locally.
MIT License
226 stars 33 forks source link

Error logged for services, configmaps and pods deletion with an 'Unauthorized' message #79

Closed srozange closed 8 months ago

srozange commented 8 months ago

Hello,

I'm using kubedock (v0.15.4) in conjonction with testcontainers on a gitlab kubernetes runner.

Everything seems to run fine but I'm getting those errors : I0304 17:10:54.823244 1 main.go:207] exit signal recieved, removing pods, configmaps and services
E0304 17:10:54.848290 1 delete.go:40] error deleting services: Unauthorized
E0304 17:10:54.857436 1 delete.go:44] error deleting configmaps: Unauthorized
E0304 17:10:54.869173 1 delete.go:48] error deleting pods: Unauthorized
E0304 17:10:54.869194 1 main.go:209] error pruning resources: failed deleting container a0b8bd8c3c3f

It doesn't seem to be a rbac issue (pod creation is fine for example, and delete right are given).

The complete logs :

I0304 17:07:32.925004 1 main.go:29] kubedock 0.15.4 (20240226-194522) / kubedock.id=a0b8bd8c3c3f
I0304 17:07:32.925918 1 main.go:108] kubernetes config: namespace=*, initimage=joyrex2001/kubedock:0.15.4, dindimage=joyrex2001/kubedock:0.15.4, ready timeout=1m0s
I0304 17:07:32.926627 1 main.go:160] reaper started with max container age 1h0m0s
I0304 17:07:32.926707 1 main.go:101] enabled reverse-proxy services via 0.0.0.0 on the kubedock host
I0304 17:07:32.926722 1 main.go:115] default cpu request: 10m,400m
I0304 17:07:32.926728 1 main.go:119] default memory request: 600Mi,1400Mi
I0304 17:07:32.926754 1 main.go:128] default image pull policy: ifnotpresent
I0304 17:07:32.926766 1 main.go:131] service account used in deployments: ***

I0304 17:07:32.926793 1 main.go:135] using namespace: ***
[GIN] 2024/03/04 - 17:08:46 | 200 | 49.847µs | 127.0.0.1 | GET "/info"
[GIN] 2024/03/04 - 17:08:48 | 200 | 57.795µs | 127.0.0.1 | GET "/version"
[GIN] 2024/03/04 - 17:08:48 | 200 | 40.342µs | 127.0.0.1 | GET "/images/json"
[GIN] 2024/03/04 - 17:08:48 | 200 | 96.855µs | 127.0.0.1 | GET "/images/postgres:16-alpine/json"
[GIN] 2024/03/04 - 17:08:50 | 201 | 511.67µs | 127.0.0.1 | POST "/containers/create"
W0304 17:08:50.655274 1 container.go:225] user not set, will run as user defined in image
I0304 17:08:53.718339 1 deploy.go:211] reverse proxy for 36685 to 5432
I0304 17:08:53.718360 1 tcpproxy.go:37] start reverse-proxy 0.0.0.0:36685->10.129.3.95:5432
[GIN] 2024/03/04 - 17:08:56 | 204 | 6.275827295s | 127.0.0.1 | POST "/containers/ae55e084ae46d7413841afdb017221098a252eb17ce5957018f85b36988c3ba0/start"
[GIN] 2024/03/04 - 17:08:56 | 200 | 3.621268ms | 127.0.0.1 | GET "/containers/ae55e084ae46d7413841afdb017221098a252eb17ce5957018f85b36988c3ba0/json"
[GIN] 2024/03/04 - 17:10:14 | 200 | 13.004023ms | 127.0.0.1 | GET "/containers/ae55e084ae46d7413841afdb017221098a252eb17ce5957018f85b36988c3ba0/json"
I0304 17:10:14.481539 1 tcpproxy.go:52] stopped reverse-proxy 0.0.0.0:36685->**:5432
[GIN] 2024/03/04 - 17:10:14 | 200 | 1m16s | 127.0.0.1 | GET "/containers/ae55e084ae46d7413841afdb017221098a252eb17ce5957018f85b36988c3ba0/logs?stdout=true&stderr=true&follow=true&since=0"
[GIN] 2024/03/04 - 17:10:14 | 204 | 26.683886ms | 127.0.0.1 | POST "/containers/ae55e084ae46d7413841afdb017221098a252eb17ce5957018f85b36988c3ba0/kill"
[GIN] 2024/03/04 - 17:10:14 | 200 | 4.147342ms | 127.0.0.1 | GET "/containers/ae55e084ae46d7413841afdb017221098a252eb17ce5957018f85b36988c3ba0/json"
[GIN] 2024/03/04 - 17:10:14 | 204 | 47.803µs | 127.0.0.1 | DELETE "/containers/ae55e084ae46d7413841afdb017221098a252eb17ce5957018f85b36988c3ba0?v=true&force=true"
[GIN] 2024/03/04 - 17:10:34 | 200 | 56.559µs | 127.0.0.1 | GET "/containers/json?all=true&filters=%7B%22label%22%3A%5B%22org.testcontainers%3Dtrue%22%2C%22org.testcontainers.sessionId%3D470211f2-1c 7c-4f3d-8677-1c62b8eec558%22%5D%7D"
[GIN] 2024/03/04 - 17:10:34 | 201 | 40.028µs | 127.0.0.1 | POST "/networks/prune?filters=%7B%22label%22%3A%5B%22org.testcontainers%3Dtrue%22%2C%22org.testcontainers.sessionId%3D470211f2-1c7c-4f3d-86 77-1c62b8eec558%22%5D%7D"
[GIN] 2024/03/04 - 17:10:34 | 201 | 41.169µs | 127.0.0.1 | POST "/volumes/prune?filters=%7B%22label%22%3A%5B%22org.testcontainers%3Dtrue%22%2C%22org.testcontainers.sessionId%3D470211f2-1c7c-4f3d-867 7-1c62b8eec558%22%5D%7D"
[GIN] 2024/03/04 - 17:10:34 | 201 | 48.76µs | 127.0.0.1 | POST "/images/prune?filters=%7B%22label%22%3A%5B%22org.testcontainers%3Dtrue%22%2C%22org.testcontainers.sessionId%3D470211f2-1c7c-4f3d-8677 -1c62b8eec558%22%5D%7D"
I0304 17:10:54.823244 1 main.go:207] exit signal recieved, removing pods, configmaps and services
E0304 17:10:54.848290 1 delete.go:40] error deleting services: Unauthorized
E0304 17:10:54.857436 1 delete.go:44] error deleting configmaps: Unauthorized
E0304 17:10:54.869173 1 delete.go:48] error deleting pods: Unauthorized
E0304 17:10:54.869194 1 main.go:209] error pruning resources: failed deleting container a0b8bd8c3c3f

joyrex2001 commented 8 months ago

Hm, still must be something rbac related. Are you sure it's using the token associated with the rbac you configured and it has delete permissions for pods, configmaps and services?

srozange commented 8 months ago

Here is an extract from my rbac : rbac: create: true rules:

It's strange, I tried without gitlab ie. a pod containing kubedock + a docker client that starts a container and I don't see this error when deleting the pod (I'm using the same service account than in gitlab).

joyrex2001 commented 8 months ago

Not sure if that's the actual problem, but the rule for configmaps (and others) does not include de apiGroups config.

srozange commented 8 months ago

I think apiGroups can be absent. (I'm pretty sure I already tried with an empty value) I tried to start a gitlab pipeline with a kubedock service and a service containing a kubectl client and I was able to do some get and delete of services. I'm not familiar with go but is it sure that the deleteServices function is using the pod serviceAccount ? (anyway I think that if it was a serviceAccount issue, there'd be a message more like : "Error: services is forbidden....")

joyrex2001 commented 8 months ago

There is no extra added magic, other than using the standard go kubernetes client. This basically takes the available token (in a pod case, the token of the service account attached to it).

srozange commented 8 months ago

I noticed that the position of the cancel method call changed in this commit : https://github.com/joyrex2001/kubedock/commit/b62f8e927123b10a1db76a1213b7a3733632fd35

Would it be possible that the cancel method make somehow loose the token ?

-> I'm going to see if I can reproduce with a 0.14.0. (edit : same error with 0.14.0)

srozange commented 8 months ago

I reproduced the issue manually by killing the pod with zero as grace period

There's a parameter in gitlab runner to modify the grace period : https://docs.gitlab.com/runner/executors/kubernetes/index.html#pod-lifecycle

I don't know what is the default value but adding this entry to my runner toml configuration fixed my issue : pod_termination_grace_period_seconds = 10

joyrex2001 commented 8 months ago

Looks like kubedock didn't got enough time to exit properly. Kubedock will remove the resources when it receives a sigint, sigterm or sigquit (which why quiting can take a bit longer).