woodpecker-ci / woodpecker

Woodpecker is a simple, yet powerful CI/CD engine with great extensibility.
https://woodpecker-ci.org
Apache License 2.0
4.23k stars 367 forks source link

Memory Leak using woodpecker with kubernetes #4228

Open lara-clink opened 1 week ago

lara-clink commented 1 week ago

Component

agent

Describe the bug

I’ve been encountering what appears to be a memory leak issue when running Woodpecker CI on a Kubernetes cluster. After running pipelines over time, I noticed that the memory usage of the Woodpecker agents and server steadily increases, eventually leading to performance degradation and, in some cases, the need for manual intervention to prevent the system from becoming unresponsive.

Steps to reproduce

Deploy Woodpecker CI in a Kubernetes environment. Run multiple pipelines continuously over an extended period. Monitor memory usage of the Woodpecker agents and server, I will attach my grafana memory usage graph. Notice that memory consumption increases over time without being released after pipeline execution. image

Expected behavior

Memory usage should stabilize after pipeline executions are completed, and unused memory should be reclaimed properly.

System Info

Woodpecker Version: 2.7.0
Kubernetes Version: v1.29.4
Environment: Running Woodpecker on a Kubernetes cluster
Number of agents: 10

Additional context

I am using golang profiling to find something about it, this is what I could find so far:

Captura de Tela 2024-10-14 às 14 03 23

woodpeckergraph

Has anyone ever faced an issue like this?

Validations

zc-devs commented 1 week ago

Has anyone ever faced an issue like this?

Not me. But have no such a load (10 agents) :)

  1. When did it start / what is the behavior on the previous versions? Have you tested on 2.7.1, next?
  2. How to gather this pprof statistic? Is there some guide? I didn't find anything in the WP docs.
  3. Nice pprof info, but this screens from an agent, which allocated 44.36 MB of memory, if I understand correctly. However, Grafana shows memory usage around 1 GB and that is the issue (I suppose). It would be nice if you had pprof stats from a mentioned agent.
  4. What is the load? I mean WOODPECKER_MAX_WORKFLOWS and how many do you run simultaneously? Could you explain the right half of the Grafana chart? Something like:
    • at this point we run 1 pipeline with 10 workflows
    • at this point they all finished
    • at this point we run another 10 pipelines with 1 workflow
    • at this point they finished and there were no load at all for next 1 hour
  5. What is the config of the Server? How much instances? What's about database? What is the load on Server and database?
  6. Where do you store the pipeline (steps) logs?
lara-clink commented 5 days ago

Hey @zc-devs , we are currently working on our migration project (automated migration from Drone CI to Woodpecker) and I could not collect all of the answers for you yet. By the end of this week I should be able to come back to that.