woodpecker-ci / woodpecker

Woodpecker is a simple, yet powerful CI/CD engine with great extensibility.
https://woodpecker-ci.org
Apache License 2.0
4.07k stars 353 forks source link

Kubernetes backend : too many agents registered #2027

Closed ymettier closed 10 months ago

ymettier commented 1 year ago

Component

server, agent

Describe the bug

I noticed that I have a lot of agents declared in http://my-ci/admin/settings#agents.

After few tests, I noticed that every time I restart a pod with an agent, it registers itself, adding a new agent declared in http://my-ci/admin/settings#agents.

Note : in the Kubernetes world, pods can be deleted at any time, for load balancing, draining a node, upgrading, or any other good or bad reason. With the Helm chart provided at woodpecker-ci/helm the pods are under control of a Deployment, which means that they are stateless. Any new pod has no idea of the registration of the older one.

Bug : some cleanup should be done automatically when a pod is removed.

Suggestion : when an pod terminates, the agent should unregister before leaving.

System Info

{"source":"https://github.com/woodpecker-ci/woodpecker","version":"next-2e851ba4"}

Backend: Kubernetes

Additional context

No response

Validations

ymettier commented 1 year ago

Of course : a manual fix is to remove all the agents. Then restart the pods : they will registers automatically.

zc-devs commented 1 year ago

some cleanup should be done automatically when a pod is removed.

+1. Or by scheduled task at least.

when an pod terminates, the agent should unregister before leaving.

+1.

Seems the same problem should appear for autoscaler and I wonder if it has been resolved somehow.

As a workaround we can:

  1. Make it statefulset (1, 2)
  2. Use agent token if we have one agent, because configuring that way more agents is inconvenient (needs deployment per agent in Kubernetes).

P.S. It's feature request, I would say.

ymettier commented 1 year ago
  1. If a pipeline can run on any agent, I don't like the idea to make the agent statefulset. Statefulset means that it has a state (of course) and if that state needs to be persisted across reboots of the pod, it also means that agents may differ in the time... If you really need to keep a state, I think that state should be stored in the server and provided to the agent when it connects.
  2. Yes ! I'm working with Grafana Loki : it also works with tokens. This works well. When one "agent" ("ingester" in Loki terminology) leaves, it releases its token. When it fails to release its token, you can connect and release it manually (human intervention to fix the problem). With tokens, you can better configure how many agents are running, limit them, refuse "unknown" (aka pirats) agents...

Well, I'm writing like if I was a member of the Architecture Board of WoodpeckerCI, which I'm not. I'm fairly new to Woodpecker-ci. Please consider my suggestions as what they are : just suggestions.

P.S. It's feature request, I would say.

I consider it as a security issue when any agent, including malicious agents can connect as soon as they have the door key :)

pat-s commented 1 year ago

I second this behavior and have also asked/reported this somewhere some months ago but can't find it anymore.

Suggestion : when an pod terminates, the agent should unregister before leaving.

I think this would be the best approach. I don't think that switching to a statefulset would make anything better.