hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.92k stars 1.95k forks source link

"Your IP is issuing too many concurrent connections" with server UI behind proxy #15471

Closed shoeffner closed 1 year ago

shoeffner commented 1 year ago

For a few months (at least October, but probably earlier) we are routinely getting 429 Too Many Requests: Your IP is issuing too many concurrent connections, please rate limit your calls, especially when navigating the UI, which seems to be thrown by https://github.com/hashicorp/nomad/blob/ee2f3e4e7ceb15cd6d7dc667cc301d41fc48b73e/command/agent/http.go#L269-L296 . From the related commit message ("Return 429 response on HTTP max connection limit. Instead of silently closing the connection [...]"), I guess that's a good sign as we now see that something is wrong.

However, this behavior (not the error, but the rate limiting) causes troubles with our setup: You can see from the logs all connections to our nomad come from 127.0.0.1, as we proxy the connections through Fabio. I assume that Nomad could handle way more connections, since the limit specifies "Your IP", and in our case, every call has the same IP:

Dec 05 11:20:17 cluster-server nomad[1491011]:     2022-12-05T11:20:17.108+0100 [WARN]  http: Too many concurrent connections: address=127.0.0.1:58620 limit=100

I found https://github.com/hashicorp/nomad/blob/171ca522e05a44eb10ca9c55256ee7bd8f0d06e6/nomad/structs/config/limits.go#L27 in the source, but not in the docs. Are those "public" settings I should use? I am not even sure the settings are used by the code in question, although they seem to be set at https://github.com/hashicorp/nomad/blob/ee2f3e4e7ceb15cd6d7dc667cc301d41fc48b73e/command/agent/http.go#L281-L283 But the rate limiter has a hard-coded 100 a few lines above that.

How do you handle deployments behind a proxy? Or should we simply not deploy Nomad behind proxies? Or can Nomad use headers such as X-Forwarded-For, Forwarded, etc. to check the connections?

Nomad version

Output from nomad version

Nomad v1.4.2 (f0c64605666324e886377ab897085a015a10a58c+CHANGES)

(We have a custom patch for some mount options, hence the commit might not be accurate -- but this issue is very likely unrelated)

Operating system and Environment details

Ubuntu 20.04.5 LTS (GNU/Linux 5.4.0-131-generic x86_64) We proxy the Nomad UI through fabio.

Issue

We get rate limited due to our proxy making too many requests to the Nomad server.

Reproduction steps

Deploy Nomad behind a proxy and fire up multiple connections to it, best from different IP addresses to see the impact.

Expected Result

Normal use of the UI should not come to a halt because many users are seen as the same user.

Actual Result

Rate limiting is shared among all users.

Job file (if appropriate)

n/a

Nomad Server logs (if appropriate)

Dec 05 11:20:17 cluster-server nomad[1491011]:     2022-12-05T11:20:17.108+0100 [WARN]  http: Too many concurrent connections: address=127.0.0.1:58620 limit=100

Nomad Client logs (if appropriate)

n/a

ngcmac commented 1 year ago

Hi,

I'm also facing this issue using apache as a reverse proxy to Nomad ui and api. Is there any chance that MaxConnsPerClient can be set through Nomad config? Nomad v1.4.3

2023-02-02T17:03:35.072Z [WARN]  http: Too many concurrent connections: address=172.29.201.23:36932 limit=100

Thanks

tgross commented 1 year ago

Hi @shoeffner! The configuration documentation for those limits can be found under limits, which doesn't have its own sidebar section for some reason. We should probably split that out. But for your environment that's exactly what you'll want to set. The hard-coded value you see in the code of 10 is the requests-per-second-per-IP (with a burst limit of 100), which is used to slow down requests but shouldn't fire errors.

So for your setup you'll want to set http_max_conns_per_client but I would also recommend setting rpc_max_conns_per_client. This is also something that the Task API socket we're shipping in Nomad 1.5.0 (https://github.com/hashicorp/nomad/pull/15864) is designed to help out with by ensuring the proxy task can reuse one connection.

That being said, you might also want to take a look at your Fabio configuration. I'm not super familiar with Fabio but I wouldn't expect a load balancer to open new connections to the upstream for every single incoming request.

shoeffner commented 1 year ago

Hi @tgross, thanks for pointing me to the docs, I couldn't find it back then. That should close this issue.

We discussed all of this back and forth and decided that a better long-term solution is to remove fabio from the loop and use Consul DNS to directly point at the Nomad servers, which will then be able to properly rate limit the clients etc. It will also be a much more resilient setup, as Nomad itself will no longer rely on fabio as a single point of failure.

But in the meantime, I will configure the http and rpc max connections, thank you very much!

We will update to 1.5.0 "soonish", we still need to evaluate the new SSO vs our own custom login solution (problems we faced with Vault as a token issuer are detailed in hashicorp/vault#16183). But we will certainly keep an eye out for the Task API.

thefallentree commented 11 months ago

This is related to #19212