Add optional feature to set the SO_REUSEADDR option before binding th…

tsenart / vegeta

HTTP load testing tool and library. It's over 9000!

http://godoc.org/github.com/tsenart/vegeta/lib

MIT License

23.06k stars 1.34k forks source link

Add optional feature to set the SO_REUSEADDR option before binding th… #638

Open fabiorush opened 12 months ago

fabiorush commented 12 months ago

Background

Add an optional parameter to set the SO_REUSEADDR option on the TCP socket before binding it (default false). Can reduce the amount of "bind: address already in use" errors when doing many connections with the same address and port combination, like the ones that happened on https://github.com/tsenart/vegeta/issues/583.

Checklist

[x] Git commit messages conform to community standards.
[x] Each Git commit represents meaningful milestones or atomic units of work.
[ ] Changed or added code is covered by appropriate tests.

tsenart commented 12 months ago

Hey, thanks for the patch. I have couple question.

Does this work on every OS/architecture pair we ship releases to? If not, that's fine. We just need to document it properly and make sure this only runs on the supported ones. See the latest release for the supported targets: https://github.com/tsenart/vegeta/releases/tag/v12.11.0
Are there any risks / trade-offs involved with setting this on outgoing connections? I've used this when binding to a port to receive incoming requests but never to outgoing connections.

tsenart commented 12 months ago

Here's what GPT4 had to say about this: https://chat.openai.com/share/b515e3dc-6193-4550-8d5d-3ecaa8088755

fabiorush commented 12 months ago

Yeah, I'm aware of the risks and the follow article explain it really well:

https://hea-www.harvard.edu/~fine/Tech/addrinuse.html

I think the main risk is to have a socket A receiving packets that should be received by socket B, but I think it doesn't matter much for the load test itself (it would matter for the report in the end, but I think we could add a note about that).

fabiorush commented 12 months ago

And this post on stackoverflow talks about the different implementation of the REUSEADDR and REUSEPORT flags on many architectures. In resume the REUSEADDR flag stays the same, while we have some minor variations on the REUSEPORT flag.

https://stackoverflow.com/questions/14388706/how-do-so-reuseaddr-and-so-reuseport-differ

fabiorush commented 12 months ago

About the suggestions made by GPT4, something we cannot or are not allowed to modify sys net ipv4 parameters. For example, decreasing the tcp_fin_timeout will also affect other services, which may not be desirable.

tsenart commented 11 months ago

I need to convince myself that this is safe, and so far I haven't been able to. Maybe you can help me get there. If someone runs into "bind: address already in use" even given the fact that we're using HTTP keep alive underneath and re-using TCP connections, it like means the server is very slow to respond, or the network really faulty, or a mix. In either case, the TCP connection would still be open, and the client reading a response, so re-using the local address seems unsafe?

Or is it only applicable when the underlying TCP connection is in TIME_WAIT state? In any case, I'd like to write some sort of integration test that exercises this and gives us some more confidence.

fabiorush commented 11 months ago

Exactly. On our use case we were executing a load test to a load balancer composed of 8 front end servers. Those servers queries a route table located on another server and do a proxy_pass to one internal server out of 20. We were trying to find out what happens when the communication to the route table service becomes unresponsible. Because how the retries were implemented a request that took milliseconds to be responded went to up 5 seconds. We wanted to know if the front end servers would crash or also become unresponsible but we started to runs into "bind: address already in use". The setting of the reuse option fixed this on our case. It only allows the reuse of TIME_WAIT sockets when you are binding to different combination of remote IP address and port. So if you have a socket in TIME_WAIT with the combination LOCAL_ADDR_A:PORT_A <=> REMOTE_ADDR_A:PORT_X and try the same combination again it will not allow, but if use a combination with another remote IP address or port like LOCAL_ADDR_A:PORT_A <=> REMOTE_ADDR_B:PORT_X it will reuse the LOCAL_ADDR_A:PORT_A when binding.

peterbourgon commented 11 months ago

If you're hitting "bind: address in use" issues in your load tests, it typically means you're exhausting the capabilities of the network stack on the host. SO_REUSEADDR and SO_REUSEPORT won't meaningfully impact those constraints. And, in any case, it's a situation that you would — should! — never encounter in a normal deployment, so it doesn't make sense to try to accommodate in a vegeta load test.

edit: vegeta acts as a client, not a server, and SO_REUSEADDR/PORT on the client side is far more restrictive than on the server side. Specifically, it doesn't allow a single physical connection from a given client addr:port to a given server addr:port to mux arbitrary logical connections.

Instead, you want to make sure that vegeta creates no more than a reasonable number of connections to a given target host. The best way to do that is by setting -max-connections to a value like 8 or 16 or 32. And if that means you can't do as many RPS as you want, then you need to run vegeta on multiple hosts.