zalando / skipper

An HTTP router and reverse proxy for service composition, including use cases like Kubernetes Ingress
https://opensource.zalando.com/skipper/
Other
3.12k stars 351 forks source link

Skipper produces 500 "Internal Server Error" HTTP status code on EOF #269

Closed hjacobs closed 6 years ago

hjacobs commented 7 years ago

We observe sporadic issues with Skipper connecting to Kubernetes ClusterIP service. Relevant Skipper log lines:

500:

02:26:40.814 log='[APP]time="2017-02-05T01:26:39Z" level=error msg=EOF'
02:26:40.814 log='52.59.216.130 - - [05/Feb/2017:01:26:39 +0000] "GET / HTTP/1.1" 500 22 "" "zmon-worker/v130" 2 hjacobs-test.example.org'

500:

20:10:14.338 log='[APP]time="2017-02-05T19:10:10Z" level=error msg=EOF'
20:10:14.338 log='52.59.216.130 - - [05/Feb/2017:19:10:10 +0000] "GET / HTTP/1.1" 500 22 "" "zmon-worker/v130" 14 hjacobs-test.example.org'

Expected behavior: Skipper should never produce "Internal Server Error", but instead handle the problem and report an appropriate status code (maybe 503?).

aryszka commented 7 years ago

the msg=EOF is weird. At first sight not sure what's the source of it, spotted in other deployments, too. Deploying this could help: https://github.com/zalando/skipper/pull/266/files , can I get +1 there if the PR is ok?

aryszka commented 7 years ago

https://github.com/zalando/skipper/releases/tag/v0.9.5

aryszka commented 7 years ago

code to reproduce:

https://gist.github.com/aryszka/6da6e379750994ae348646a88ecd84db

aryszka commented 7 years ago

considering the behavior of the code linked above, the issue can be that the connection close on the server doesn't happen immediately, even if it is reported to be closed. In this case the 'write: broken pipe' error makes sense, and the only weird thing is when it reports EOF.

So reporting '503 Service Unavailable' seems legit. Or maybe '502 Bad Gateway'?

szuecs commented 7 years ago

I would suggest 502 or 504 https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#5xx_Server_Error

503 would mean that skipper itself is unavailable.

aryszka commented 7 years ago

For the reported EOF, the related issue in Go net/http is this: https://github.com/golang/go/issues/13667 . It was reintroduced by this: https://github.com/golang/go/issues/16465

Interesting can be also: https://golang.org/src/net/http/response.go#L150

szuecs commented 7 years ago

Interesting can be also: https://golang.org/src/net/http/response.go#L150

This is indeed interesting, because it shows that it doesn't expect to get a close() without any write() before.

aryszka commented 7 years ago

the below two links can be interesting, too. The changeset is "only" one year old.

https://github.com/golang/go/issues/4677 https://github.com/golang/go/commit/5dd372bd1e70949a432d9b7b8b021d13abf584d1

what if skipper somehow prevents retries for idempotent requests. Nginx seems to be doing retries by default: https://github.com/kubernetes/ingress/blob/master/controllers/nginx/configuration.md#custom-nginx-upstream-checks

Raffo commented 7 years ago

Can we close this issue?

szuecs commented 6 years ago

We changed a lot in the error handling, I suspect that this is not valid any more. We also set more explicit status code on different failures.