Issue with connection to /tmp/prometheus-nginx.socket

petrokashlikov commented 6 years ago

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.):

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.):

prometheus-nginx.socket,lua,lua entry thread aborted

I was able to find similar issue https://github.com/kubernetes/ingress-nginx/issues/2688, but our case is a bit different. nginx is not failing to start and most of the time it actually works properly, but then throw errors

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG

NGINX Ingress controller version: 0.17.0

Kubernetes version (use kubectl version): v1.10.6-gke.2

Environment: GKE

Cloud provider or hardware configuration: GKE
OS (e.g. from /etc/os-release): Debian GNU/Linux 9 (stretch)
Kernel (e.g. uname -a): Linux nginx-ingress-controller-65447875dd-shpgj 4.15.0-1017-gcp #18~16.04.1-Ubuntu SMP Fri Aug 10 13:26:07 UTC 2018 x86_64 GNU/Linux
Install tools:
Others:

What happened: After we update nginx to 0.17.0 we begin to get such kind of errors in nginx logs

"2018/09/12 02:43:18 [error] 1600#1600: *686359 connect() to unix:/tmp/prometheus-nginx.socket failed (11: Resource temporarily unavailable), context: ngx.timer, client: 10.12.1.10, server: 0.0.0.0:443

What you expected to happen: I expect it not to throw this errors

How to reproduce it (as minimally and precisely as possible): As far I can tell that this seems to be occurring when controller is under load, but I have not yet found exact way to trigger this.

Anything else we need to know:

aledbf commented 6 years ago

@petrokashlikov please update to 0.19.0. In that version, we send the metrics in batches to avoid this issue. https://github.com/kubernetes/ingress-nginx/pull/2957

petrokashlikov commented 6 years ago

@aledbf Thank you, we will try this today.

aledbf commented 6 years ago

Closing. Please reopen if this is still an issue after the update to 0.19.0

petrokashlikov commented 6 years ago

@aledbf it seems I can't re-open issue as per github rules, if you closed it. Actually we had to rollback to 0.17.0, because upgrade to 0.19.0 introduce another issue for us. We started getting such errors for different api, below is just one example, I've replaced ip values as well. Trailing slash is actually missing after port number for some reasons. Did something changed between 0.17.0 and 0.19.0 in regards to proxy rules processing?

2018-09-12 17:45:10.000 EDT 2018/09/12 21:45:10 [error] 2623#2623: *473 upstream prematurely closed connection while reading response header from upstream, client: xx.xx.xxx.xxx, server: platform-dev.xxxxxxxxx.xx, request: "GET /api/v1/health/firebase HTTP/2.0", upstream: "http://xx.xx.x.x:8080api/v1/health/firebase/", host: "platform-dev.xxxxxxxxx.xx"

aledbf commented 6 years ago

Actually we had to rollback to 0.17.0, because upgrade to 0.19.0 introduce another issue for us

There is change the behavior for ending paths with / when rewrite is used https://github.com/kubernetes/ingress-nginx/pull/2899

aledbf commented 5 years ago

Closing. Please open a new issue describing the rewrite problem.

Please first try 0.20.0.

kubernetes / ingress-nginx

Issue with connection to /tmp/prometheus-nginx.socket #3084