Open 6543 opened 1 year ago
related bug/feature: -> admin should be able to reset queue.
there is no check if a pending pipeline does have tasks in queue
func (q *fifo) depsInQueue(task *Task) bool
might need a check if deps are still valid
Hi everyone,
we use a complex build config with 30 pipelines per build and a lot inter-dependencies between them. We use two agents with a capacity of 4 and 10. The server is running on the same server as the agent with the capacity of 4. When multiple builds are pending - so there are 50 to 90 pending pipelines in the queue - we sometimes have the issue, that the woodpecker server gets unresponsive and the logs look like shown here. This can on go on for a while and sometimes resolves on its own, but in other cases I had to recover from this state manually (e.g. after 30min). To do that, I stop/kill the agents and then restart the server. Before I start the agents again, I abort all pending/in progress builds and only restart a single one.
How can we help fixing the issue? Do you have something specifically in mind we should look at? Would it help if we provide more logs? We're currently on version 2.4.1 but I think we had this issue for a longer time; although to a smaller extent. I noticed when tailing the docker logs, that the server is emitting more logs per second than docker is able return - noticeable by the timestamp in the logs incrementing slower than the wall clock time. I haven't noticed this issue to occur for our more simple builds with only one or two pipelines.
Thanks
woodpecker server log: