Server doesn't work (nicely) with multiple replicas

pat-s commented 1 year ago

When enabling it, runs triggered in the UI are "pending forever" and are somehow not sent to the agents properly. It seems only one server pod is able to do that and if a user gets a session with the other one, the behavior described above occurs.

I am not meaning to infer it should work, just noting it down for visibility etc.

Killing a "wrong" pod and getting assigned a new session with the "correct" one will let subsequent restarts be triggered successfully.

anbraten commented 1 year ago

@pat-s Thanks for the testing. I guess the main issue is that the agents are connecting to one of the servers (probably grpc should be behind a load-balancer as well) and the servers do not share those agents with each other. The queue with pipelines to be executed should however be saved to the database so each server with an agent should be able to propagate this pipeline to an agent. Not sure why this wasn't already working for you (maybe you can verify it again by reloading the UI / logs). For the UI as you currently connect to a single server which is not getting real-time events from the others you have to reload the page each time to get the latest data via the database from the other servers.

We had an issue in the main repo where we "planned" HA a bit: https://github.com/woodpecker-ci/woodpecker/issues/742

genofire commented 1 year ago

to get the same user on some pod, it is easy: use the correct annotation on ingress for your ingress-controller.

but if the other application (woodpecker) does not support it (caching, storage ...) we still could wait till GatewayAPI support client session stickiness and then replace the ingress with it (so that the helm-chart user has not anymore to lookup for the correct annotations).

woodpecker-ci / helm

Server doesn't work (nicely) with multiple replicas #42