Closed brava-vinh closed 6 years ago
I created https://github.com/kubernetes/ingress-nginx/pull/2167 to add this feature.
@aledbf @ElvinEfendi
nginx.ingress.kubernetes.io/load-balancer: "ip-hash"
Hi, this annotations
did not effective, I cannot find this config in nginx.conf. Is there anything I need to pay attention to?
@shenshouer I believe it is ip_hash
not ip-hash
: http://nginx.org/en/docs/http/ngx_http_upstream_module.html#ip_hash
If you are running ingress-nginx with --enable-dynamic-configuration
then that load balancing algorithm is not available. In that case alternatively you can use https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-nginx-upstream-hashing with $remote_addr
.
@ElvinEfendi Thank you very much
nginx.ingress.kubernetes.io/upstream-hash-by: "$host"
worked for me!
load-balance: ip_hash
is available only in non-dynamic mode with args --enable-dynamic-configuration=false
if you want to use dynamic ip_hash, you should use nginx.ingress.kubernetes.io/upstream-hash-by: "$binary_remote_addr"
If you are running ingress-nginx with --enable-dynamic-configuration then that load balancing algorithm is not available. In that case alternatively you can use https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-nginx-upstream-hashing with $remote_addr.
This doesn't seem to work at all with v0.17.1. I am not seeing any effect of the annotation, load balancing is still poor :(
My annotation is: nginx.ingress.kubernetes.io/upstream-hash-by: $binary_remote_addr
(I tried $remote_addr
as well, no difference).
Are you sure these annotations are actually being honoured in dynamic-config mode?
Unfortunately I need to have dynamic-config mode turned on, or else websockets get disconnected periodically. but with dynamic-config mode turned on, load balancing is very poor, so wesocket backends get 150 connections, some get only 10.
I am not seeing any effect of the annotation, load balancing is still poor :(
@gjcarneiro can you post the steps you followed so that we can regenerate this? We have a test for this feature in dynamic mode: https://github.com/kubernetes/ingress-nginx/blob/master/test/e2e/lua/dynamic_configuration.go#L256. But I'm open to investigate more if you provide the steps to see that upstream-hash-by
is not honored.
load balancing is still poor :(
Using upstream-hash-by
with $binary_remote_addr
means Nginx will proxy requests from the same client to the same upstream. So depending on your app's traffic you can see a poor load balancing (i.e when it's only few clients the sends most of the requests).
--
If you app does not require consistent hashing (upstream-hash-by
) I'd suggest you use ewma
load balancing (nginx.ingress.kubernetes.io/upstream-hash-by: "ewma"
).
I have been trying several things. The thing that convinces me it is not working is that I add an annotation nginx.ingress.kubernetes.io/upstream-hash-by: $request_uri
. Then, while having two pods, I create websocket connections to the service, and grafana metrics report the number of connections in each pod.
Connecting twice to wss://xxx.com/v1/stream
I get two connections to the same pod. By tweaking one of the connections URI, e.g. wss://xxx.com/v1/stream/1
, I still get connected to the same pod. If this was hashing the URI, I should be able to come up with a URI that makes it connect to the other pod, but no matter what URI change I try, all connections are still going to the first pod (trying with up to 5 test connections simultaneously).
In fact, if I disable dynamic config, it works fine. But I can't disable dynamic mode; I tried and observed websocket disconnections approximately every 5 minutes (i.e., the value of worker-shutdown-timeout
).
All I want is for connections from clients to be as evenly distributed to pods as possible. Because each client connection consumes memory and CPU on the pod, so if one pod has 10 times more connections than another it might reach the CPU and memory limits, while other pods are mostly idle. Ewma seems to be all about latency, but I don't know how good a correlation there is between latency and number of connections, especially with websockets.
Not sure if this helps, but the nginx ConfigMap contains:
compute-full-forwarded-for: "true"
disable-ipv6: "true"
disable-ipv6-dns: "true"
proxy-read-timeout: "3600"
proxy-send-timeout: "3600"
use-proxy-protocol: "true"
worker-shutdown-timeout: "300"
And an extract of the Deployment:
containers:
- args:
- /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-http-backend
- --configmap=$(POD_NAMESPACE)/nginx-configuration
- --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
- --udp-services-configmap=$(POD_NAMESPACE)/udp-services
- --publish-service=$(POD_NAMESPACE)/ingress-nginx
- --annotations-prefix=nginx.ingress.kubernetes.io
- --enable-dynamic-configuration
image: quay.io/aledbf/nginx-ingress-controller:0.406
The underlying host is baremetal ubuntu 16.04. The only weird thing we have is that we disabled ipv6 kernel module.
In fact, if I disable dynamic config, it works fine. But I can't disable dynamic mode; I tried and observed websocket disconnections approximately every 5 minutes
This signal that maybe ingress nginx was somehow constantly reloading? I used to have this issue where suddenly websocket disconnect, until I found out it because we deploy other thing and ingress will reload to get new pod ip, cause nginx reload and web socket re-connect.
I eventually run websocket on its own ingress. I used below annotation for that ingress:
"nginx.ingress.kubernetes.io/force-ssl-redirect": "true",
"nginx.ingress.kubernetes.io/upstream-hash-by": "$remote_addr"
No, I don't think it was. There were no deployments being done at the time, and also the websocket disconnections every 5 minutes, with too neat regularity, too similar to worker-shutdown-timeout
to be a coincidence. Here's a screenshot:
In the middle, I tried to disable dyamic config reloads, and enable the ip_hash
load balancing in the global config. I temporarily got actually good evenly distributed load balancing. But after 5 minutes, websocket disconnections. I had to go back to dynamic config, and endure uneven distribution of connections to pods :(
Perhaps a good compromise solution will be: disable dynamic config again, but raise the worker-shutdown-timeout
to 2 hours. Websockets disconnecting after 2 hours isn't so bad, I just hope that the extra memory consumption of keeping the extra workers around in shutdown state is acceptable.
Connecting twice to wss://xxx.com/v1/stream I get two connections to the same pod. By tweaking one of the connections URI, e.g. wss://xxx.com/v1/stream/1, I still get connected to the same pod. If this was hashing the URI, I should be able to come up with a URI that makes it connect to the other pod, but no matter what URI change I try, all connections are still going to the first pod (trying with up to 5 test connections simultaneously).
@gjcarneiro In your comment at https://github.com/kubernetes/ingress-nginx/issues/1834#issuecomment-410705206 you mention you have nginx.ingress.kubernetes.io/upstream-hash-by: $binary_remote_addr
which is hashing by client IP. Was this still the case when you did that test? If so then what you've seen is expected since you are opening the connections from the same IP address.
When I have a bit more time I'll look deeper into implementation and try it locally as well.
I tried several things, but my latest experiment I changed it to nginx.ingress.kubernetes.io/upstream-hash-by: $request_uri
, and then experimented by changing the uri to see how it affected which pod it got connected to.
Yeah, of course it's possible I did something stupid, but the fact is that things magically start working as soon as I add --enable-dynamic-configuration=false
, which is what I ended up doing in prod deployment, together with worker-shutdown-timeout: "43200"
, yikes! Memory spiked and is slowly decaying. Luckily the servers have plenty of RAM.
load-balance: ip_hash
is available only in non-dynamic mode with args--enable-dynamic-configuration=false
if you want to use dynamic ip_hash, you should use
nginx.ingress.kubernetes.io/upstream-hash-by: "$binary_remote_addr"
config not work @aaashun
Currently
load-balancer
is a global value affect on all upstream. I would like to see a way to set load-balancer per ingress.We can inject this value into
annotations
per ingress and setWithout this annotation then we just use current value as we're doing right now. So I think this change will not cause any obvious issue because it is an opt-in to use.
Why is this useful
Some user who use socketio may need this, especially when using socketio in a NodeJS environment(not in browser). Because the sticky cookie wasn't used and re-send properly so sticky cookie doesn't work out of the box without some hacking around socketio client code to inject cookie jar.