Open aaronriedel opened 5 hours ago
Does it work if you deploy an agent in Kubernetes (direct Agent-Server connection, not via Traefik)?
JFYI, that is my IngressRoute
, which worked a couple of months ago:
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: woodpecker-server
spec:
entryPoints:
- websecure
routes:
- kind: Rule
match: Host(`wp.domain.tld`)
services:
- name: woodpecker-server
port: http
- kind: Rule
match: Host(`wp.domain.tld`) && Headers(`Content-Type`, `application/grpc`)
services:
- name: woodpecker-server
port: grpc
scheme: h2c
However, I didn't restarted the server, if I remember correctly.
The kubernetes-agents work fine and are not affected by the problem. It is very likely that the 5XX errors come from Traefik mainly. However I would also expect the agent to not poop itself when there are errors for a few seconds.
Matching the application type is a good hint, I might implement this. I currently don't use IngressRoute objects and instead configure normal Ingresses with annotations.
received unexpected content-type \"text/plain; charset=utf-8\"" errors come from Traefik
I think so and I had this.
The agent should properly reconnect
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:24Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:34Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:39Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:27:53Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:00Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:15Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:29Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:40Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:28:54Z","message":"grpc error: report_health(): code: Unavailable"}
{"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable); transport: received unexpected content-type \"text/plain; charset=utf-8\"","time":"2024-11-23T14:29:02Z","message":"grpc error: report_health(): code: Unavailable"}
Seems, it is trying.
Do you have 2 ingresses: one for HTTP, another for gRPC? Could you show HTTP one?
Component
agent
Describe the bug
When the server (running in kubernetes) restarts my docker agent refuses to take new jobs until restarted. In the agent logs I can see several 5XX Errors while the server reboots. After that the agent shows as online in the UI but does not take jobs.
Agent logs: See below
Steps to reproduce
Expected behavior
The agent should properly reconnect to the Server via gRPC after the server restarts.
System Info
Server:
{"source":"https://github.com/woodpecker-ci/woodpecker","version":"2.7.3"}
Helm values:
gRPC Ingress:
docker-compose config for agent:
Additional context
Agent logs:
Validations
next
version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use]