Closed hjacobs closed 6 years ago
the msg=EOF is weird. At first sight not sure what's the source of it, spotted in other deployments, too. Deploying this could help: https://github.com/zalando/skipper/pull/266/files , can I get +1 there if the PR is ok?
code to reproduce:
https://gist.github.com/aryszka/6da6e379750994ae348646a88ecd84db
considering the behavior of the code linked above, the issue can be that the connection close on the server doesn't happen immediately, even if it is reported to be closed. In this case the 'write: broken pipe' error makes sense, and the only weird thing is when it reports EOF.
So reporting '503 Service Unavailable' seems legit. Or maybe '502 Bad Gateway'?
I would suggest 502 or 504 https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#5xx_Server_Error
503 would mean that skipper itself is unavailable.
For the reported EOF, the related issue in Go net/http is this: https://github.com/golang/go/issues/13667 . It was reintroduced by this: https://github.com/golang/go/issues/16465
Interesting can be also: https://golang.org/src/net/http/response.go#L150
Interesting can be also: https://golang.org/src/net/http/response.go#L150
This is indeed interesting, because it shows that it doesn't expect to get a close() without any write() before.
the below two links can be interesting, too. The changeset is "only" one year old.
https://github.com/golang/go/issues/4677 https://github.com/golang/go/commit/5dd372bd1e70949a432d9b7b8b021d13abf584d1
what if skipper somehow prevents retries for idempotent requests. Nginx seems to be doing retries by default: https://github.com/kubernetes/ingress/blob/master/controllers/nginx/configuration.md#custom-nginx-upstream-checks
Can we close this issue?
We changed a lot in the error handling, I suspect that this is not valid any more. We also set more explicit status code on different failures.
We observe sporadic issues with Skipper connecting to Kubernetes
ClusterIP
service. Relevant Skipper log lines:500:
500:
Expected behavior: Skipper should never produce "Internal Server Error", but instead handle the problem and report an appropriate status code (maybe 503?).