woodpecker-ci / woodpecker

Woodpecker is a simple yet powerful CI/CD engine with great extensibility.
https://woodpecker-ci.org
Apache License 2.0
3.89k stars 346 forks source link

Agent failed to retrieve new jobs due to RPC error "keepalive ping failed to receive ACK within timeout" #3712

Open stevapple opened 1 month ago

stevapple commented 1 month ago

Component

agent

Describe the bug

The agent failed to pick up new jobs, reporting RPC error.

2:48AM INF src/shared/logger/logger.go:101 > log level: debug
2:48AM WRN src/pipeline/backend/kubernetes/kubernetes.go:101 > WOODPECKER_BACKEND_K8S_PULL_SECRET_NAMES is set to the default ('regcred'). It will default to empty in Woodpecker 3.0. Set it explicitly before then.
2:48AM DBG src/cmd/agent/core/agent.go:173 > loaded kubernetes backend engine
2:48AM DBG src/cmd/agent/core/agent.go:201 > agent registered with ID 30003
2:48AM INF src/cmd/agent/core/agent.go:243 > starting Woodpecker agent with version 'next-5527d9bf86' and backend 'kubernetes' using platform 'linux/amd64' running up to 1 pipelines in parallel
2:48AM DBG src/cmd/agent/core/agent.go:226 > created new runner 0
2:48AM DBG src/cmd/agent/core/agent.go:234 > polling new steps
2:48AM DBG src/agent/runner.go:54 > request next execution
2:49AM ERR src/agent/rpc/client_grpc.go:93 > grpc error: done(): code: Unavailable error="rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout"

System Info

{"source":"https://github.com/woodpecker-ci/woodpecker","version":"2.4.1"}

Additional context

The server and the agent are running in two Kubernetes clusters in different locations, connected by WireGuard + iptables.

The server still assigns the pipeline to the agent, and may falsely assign more pipelines than the capacity.

Validations

zc-devs commented 1 month ago

connected by WireGuard + iptables

Dig in this direction.