woodpecker-ci / woodpecker

Woodpecker is a simple, yet powerful CI/CD engine with great extensibility.
https://woodpecker-ci.org
Apache License 2.0
4.07k stars 353 forks source link

Some strange agent errors while the pipeline succeeds #717

Closed wxiaoguang closed 2 years ago

wxiaoguang commented 2 years ago

Agent: woodpeckerci/woodpecker-agent:next (just pulled)

Error logs:

{"level":"error","error":"io: read/write on closed pipe","time":"2022-01-25T08:56:46Z","caller":"/src/agent/runner.go:165","message":"copy limited logStream part"}

{"level":"error","error":"io: read/write on closed pipe","time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:165","message":"copy limited logStream part"}

{"level":"error","error":"Error response from daemon: Cannot kill container: 0_7591846831240336695_clone: Container 46d622bd6f9064f48b687c3358c1170acecbd1ff4cb60a6d1be652f24b494ca0 is not running","time":"2022-01-25T08:56:47Z","caller":"/src/pipeline/backend/docker/docker.go:193","message":"could not kill container '0_7591846831240336695_clone'"}
{"level":"error","error":"Error response from daemon: Cannot kill container: 0_7591846831240336695_step_0: Container 6099552c8c4a82d39381e7959bdf3374beaff7b057a3223f05f2ea4162de5d33 is not running","time":"2022-01-25T08:56:47Z","caller":"/src/pipeline/backend/docker/docker.go:193","message":"could not kill container '0_7591846831240336695_stage_0'"}

{"level":"error","error":"rpc error: code = Unavailable desc = closing transport due to: connection error: desc = \"error reading from server: EOF\", received prior goaway: code: ENHANCE_YOUR_CALM, debug data: \"too_many_pings\"","time":"2022-01-25T08:57:17Z","caller":"/src/pipeline/rpc/client_grpc.go:49","message":"grpc error: done(): code: Unavailable: rpc error: code = Unavailable desc = closing transport due to: connection error: desc = \"error reading from server: EOF\", received prior goaway: code: ENHANCE_YOUR_CALM, debug data: \"too_many_pings\""}

Full logs (context):

{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","time":"2022-01-25T08:56:45Z","caller":"/src/agent/runner.go:96","message":"received execution"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","time":"2022-01-25T08:56:45Z","caller":"/src/agent/runner.go:103","message":"listen for cancel signal"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/woodpeckerci/plugin-git:latest","stage":"clone","exit_code":0,"exited":false,"time":"2022-01-25T08:56:46Z","caller":"/src/agent/runner.go:256","message":"update step status"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/woodpeckerci/plugin-git:latest","stage":"clone","exit_code":0,"exited":false,"time":"2022-01-25T08:56:46Z","caller":"/src/agent/runner.go:264","message":"update step status complete"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/woodpeckerci/plugin-git:latest","stage":"clone","time":"2022-01-25T08:56:46Z","caller":"/src/agent/runner.go:160","message":"log stream opened"}

{"level":"error","error":"io: read/write on closed pipe","time":"2022-01-25T08:56:46Z","caller":"/src/agent/runner.go:165","message":"copy limited logStream part"}

{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/woodpeckerci/plugin-git:latest","stage":"clone","time":"2022-01-25T08:56:46Z","caller":"/src/agent/runner.go:168","message":"log stream copied"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/woodpeckerci/plugin-git:latest","stage":"clone","time":"2022-01-25T08:56:46Z","caller":"/src/agent/runner.go:184","message":"log stream uploading"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/woodpeckerci/plugin-git:latest","stage":"clone","time":"2022-01-25T08:56:46Z","caller":"/src/agent/runner.go:188","message":"log stream upload complete"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/woodpeckerci/plugin-git:latest","stage":"clone","time":"2022-01-25T08:56:46Z","caller":"/src/agent/runner.go:192","message":"log stream closed"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/woodpeckerci/plugin-git:latest","stage":"clone","exit_code":0,"exited":true,"time":"2022-01-25T08:56:46Z","caller":"/src/agent/runner.go:256","message":"update step status"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/woodpeckerci/plugin-git:latest","stage":"clone","exit_code":0,"exited":true,"time":"2022-01-25T08:56:46Z","caller":"/src/agent/runner.go:264","message":"update step status complete"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/library/golang:latest","stage":"backend","exit_code":0,"exited":false,"time":"2022-01-25T08:56:46Z","caller":"/src/agent/runner.go:256","message":"update step status"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/library/golang:latest","stage":"backend","exit_code":0,"exited":false,"time":"2022-01-25T08:56:46Z","caller":"/src/agent/runner.go:264","message":"update step status complete"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/library/golang:latest","stage":"backend","time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:160","message":"log stream opened"}

{"level":"error","error":"io: read/write on closed pipe","time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:165","message":"copy limited logStream part"}

{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/library/golang:latest","stage":"backend","time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:168","message":"log stream copied"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/library/golang:latest","stage":"backend","time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:184","message":"log stream uploading"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/library/golang:latest","stage":"backend","time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:188","message":"log stream upload complete"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/library/golang:latest","stage":"backend","time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:192","message":"log stream closed"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/library/golang:latest","stage":"backend","exit_code":0,"exited":true,"time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:256","message":"update step status"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","image":"docker.io/library/golang:latest","stage":"backend","exit_code":0,"exited":true,"time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:264","message":"update step status complete"}

{"level":"error","error":"Error response from daemon: Cannot kill container: 0_7591846831240336695_clone: Container 46d622bd6f9064f48b687c3358c1170acecbd1ff4cb60a6d1be652f24b494ca0 is not running","time":"2022-01-25T08:56:47Z","caller":"/src/pipeline/backend/docker/docker.go:193","message":"could not kill container '0_7591846831240336695_clone'"}
{"level":"error","error":"Error response from daemon: Cannot kill container: 0_7591846831240336695_step_0: Container 6099552c8c4a82d39381e7959bdf3374beaff7b057a3223f05f2ea4162de5d33 is not running","time":"2022-01-25T08:56:47Z","caller":"/src/pipeline/backend/docker/docker.go:193","message":"could not kill container '0_7591846831240336695_stage_0'"}

{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","error":"","exit_code":0,"time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:315","message":"pipeline complete"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:317","message":"uploading logs"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:321","message":"uploading logs complete"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","error":"","exit_code":0,"time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:326","message":"updating pipeline status"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:111","message":"stop listening for cancel signal"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:332","message":"updating pipeline status complete"}
{"level":"debug","time":"2022-01-25T08:56:47Z","caller":"/src/cmd/agent/agent.go:158","message":"loaded docker backend engine"}
{"level":"debug","time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:63","message":"request next execution"}
{"level":"debug","repo":"test-org-dev/ci-test","build":"3","id":"7","time":"2022-01-25T08:56:47Z","caller":"/src/agent/runner.go:119","message":"pipeline done"}

{"level":"error","error":"rpc error: code = Unavailable desc = closing transport due to: connection error: desc = \"error reading from server: EOF\", received prior goaway: code: ENHANCE_YOUR_CALM, debug data: \"too_many_pings\"","time":"2022-01-25T08:57:17Z","caller":"/src/pipeline/rpc/client_grpc.go:49","message":"grpc error: done(): code: Unavailable: rpc error: code = Unavailable desc = closing transport due to: connection error: desc = \"error reading from server: EOF\", received prior goaway: code: ENHANCE_YOUR_CALM, debug data: \"too_many_pings\""}
6543 commented 2 years ago

should be solved on our side .... can be ignored atm

anbraten commented 2 years ago

https://github.com/grpc/grpc/blob/master/doc/keepalive.md

This comes from the agent being inactive / not sending and requesting any data if there a no pipelines waiting for him to execute. GRPC seems to only allow 2 pings without calling an actual endpoint before closing the connection. I think this will be indirectly solved in the moment we implement something like #536

danog commented 2 years ago

Encountering the same issue right now on an arm64 machine with both the latest and next docker image tags, logs have the same queue: trying to assign task and grpc: to many keepalive pings without sending data messages. I can offer an ARM machine to ssh into if needed for debugging, or otherwise one could use the oracle cloud free tier ARM machines.

danog commented 2 years ago

Oh nevermind, the 4th requeue of the task seems to have fixed the issue at least for now, still can help if further debugging is needed.