UDPSocket usage floods connections

cheald commented 11 years ago

It seems that the current usage of UDPSocket is massively flooding my network's connection pools, saturating them and bringing things to a messy halt. To wit:

def send_to_socket(message)
  socket.send(message, 0, @host, @port)
end

def socket
  Thread.current[:statsd_socket] ||= UDPSocket.new
end

opens a new connection for every call to send_to_socket (Ruby does reuse the object, however!). I can run code that fires off a bunch of metric messages and watch my active connection count skyrocket in my DD-WRT control panel. These connections are cleaned up by my router 30 seconds later, but I can easily saturate my connection pool.

I've modified my local copy to open the socket once, then hang onto it:

def send_to_socket(message)
  socket.send(message, 0)
end

def socket
  Thread.current[:statsd_socket] ||= UDPSocket.new.tap do |socket|
    socket.connect @host, @port
    at_exit { socket.close }
  end
end

This solves the connection storming issue, and my data is still being sent. Is there a reason that it was done the way it is? I don't want to regress anything, but it seems pretty obvious that opening a new socket per message is massively suboptimal.

cheald commented 11 years ago

Additionally, the host-per-send is significantly slower (likely due to the connection overhead!)

socket = UDPSocket.new.tap {|socket| socket.connect host, port }
puts Benchmark.measure { 2000.times { socket.send "test", 0 } }
socket.close

socket = UDPSocket.new
puts Benchmark.measure { 2000.times { socket.send "test", 0, host, port } }
socket.close

# 0.000000   0.010000   0.010000 (  0.013814)
# 0.060000   0.130000   0.190000 (  2.873749)

And here's a demo of the connection flooding issue:

require 'socket'
require 'benchmark'

def instrument_connections(label)
  puts "[%s] Open connections: %s" % [label, `cat /proc/net/nf_conntrack | wc -l`]
end

host = "example.org"
port = 1234

instrument_connections "Before (one socket)"
socket = UDPSocket.new.tap {|socket| socket.connect host, port }
1000.times { socket.send "test", 0 }
instrument_connections "After (one socket)"

instrument_connections "Before (many sockets)"
socket = UDPSocket.new
1000.times { socket.send "test", 0, host, port }
instrument_connections "After (many sockets)"

# [root@luna repos]# ruby sockets.rb
# [Before (one socket)] Open connections: 95
# [After (one socket)] Open connections: 97
# [Before (many sockets)] Open connections: 97
# [After (many sockets)] Open connections: 1075

cheald commented 11 years ago

I've patched it here -- https://github.com/cheald/statsd/commit/b70942b2abda5c3f3f9840d20fa122c21216a29d

That said, I don't want to open a PR without understanding the rationale for the current usage mechanism.

reinh commented 11 years ago

The rationale for the current mechanism is that it's the simplest thing that could possibly work.

Please open up a PR. What's the worst that could happen? :)

raggi commented 11 years ago

Summary

Interesting, and yes, this would "solve the issue". There are some other details we'll want to consider though:

Live upgrades will no longer work (i.e. moving a statsd server, altering /etc/hosts, nscache or whatever)
This actually creates and retains a bound UDP socket
Port map overhead at the firewall level can be tuned out

Some more background:

UDP is stateless, and connectionless, at the protocol level. The BSD sockets API makes this very confusing, however, as many typical use cases still encourage the use of connect(2). If you read through udp(4), you will will note that it explains (depending on your platform) that connect(2) causes a bind(2). While this binding is still in the ephemeral range (as if you'd specified a port of 0), it is nonetheless bound. The BSD manual goes on to explain that the result is that you can now recv(2) or read(2) on that socket. By contrast, the Linux manuals suggest that a bind(2) is required before a recv(2) will work.

Regardless of platform differences, there is something about this use case that needs to be understood. Several old UDP systems were designed for a kind of "short lived" and "unreliable" connection. This is somewhat of a contradiction, when you remember that UDP itself is connectionless, but this is a "convenience" in the BSD API. By calling connect(2), now the remote can use the source port of a message for "client identification", because that source port is reused. If you do not call connect(2), then every UDP datagram will come from a new source port, and this is what is causing the entries in your netfilter tables.

Client identification is for example, you have two users on a system, both sending UDP streams (defined in this case as simply "more than one datagram") to a service. If the service wants to piece these together in some way (it would be better off using TCP most of the time), then it will need to be able to identify each of the two or more programs sending datagrams. One way, without protocol defined state, is to use the source address for incoming datagrams. You can imagine a system that aggregates incoming data every 10s, and so creates buckets that only live 10s for any source addresses that arrive, and then clears them out. Using UDP for such a use case reduces the state management overhead, and can be useful where the source hosts are unknown, and only best-effort reliability is required.

Why is all of this important?

Your router is performing NAT operations. Depending on the style of the NAT, your firewall configuration will want to maintain (for a time), a set of tables that map internal UDP source addresses (ip:port combinations on your LAN) to external UDP source addresses (ip:port combinations on your WAN interface). If it does not do this, it will cause applications such as those described above, to fail. STUN makes heavy use of such NAT features, and those RFCs will provide a lot more detail on this topic, as well as providing some useful definitions for different styles of NAT. The wikipedia page on NAT is actually a good starting point.

There is an open question here, as to whether or not the implementation in statsd-ruby is correct or not. Statsd does not require source identification, nor does it require a binding. There are potentially some performance advantages for certain use cases (where the user is not using a name server cache or a hosts entry for dns resolution), but there are also some disadvantages (losing signal-less reconfiguration). There is also the matter of system administration, where applications will be seen to be binding to ports in netstat(1), etc. This can lead to additional investigations in ops/dev/sre teams that may be distracting. I know I would be angry if I was debugging some problem in a stateful UDP application, and discovered (taking time) that a statsd client was bound with no advantage to my application.

What are the options?

We could accept a patch that binds, and accept losing the other features (most of our users are on LANs though)
You could add a bound socket by hand, to Thread.local[:statsd_socket] to avoid the issue
You could increase the size of the port map table
You could reduce the lifetime of UDP port entries in the port map table

N.B. You will want to pay attention to your memory usage on some of the smaller embedded devices, if you choose to increase the port map table size. Similarly, reducing the lifetime can affect some UDP applications, although that's likely becoming increasingly rare.

cheald commented 11 years ago

Awesome, that's the kind of background I was looking for. I'm a bit shaky on the history, so the breakdown is very useful!

I think the disadvantages are pretty clear, and worth considering before making any kind of decision one way or another. Something explicit like a Statsd.bind(host, port) might be the right means here to let people make that decision, though it'd have to be done on a per-thread basis, which could get kind of messy. Something lazier like Statsd.bind(true), which would signal #socket to perform the binding might work better.

It's worth noting that when I use an IP for the host parameter, I don't get the conntrack flooding. I'm not quite clear on why the behavior would change in that circumstance.

raggi commented 11 years ago

Oh, this may not even be statsd sockets going over that are causing your issue, but dns resolutions then.

cheald commented 11 years ago

Looks like that's indeed the case, so this is a red herring of an issue. The following reproduces the issue:

require 'socket'

def instrument_connections(label)
  puts "[%s] Open connections: %s" % [label, `cat /proc/net/nf_conntrack | wc -l`]
end

instrument_connections "Before (many sockets)"
1000.times { Socket.gethostbyname('github.com') }
instrument_connections "After (many sockets)"

So, the problem is a lack of DNS caching locally, which'd explain why I wasn't seeing issues in production node, where we run caching DNS on the servers that we're running statsd on.

I'm going to go ahead and close this ticket and the PR, but it's probably worth noting in the README that performing name resolution per publish is a quick path to pain.

reinh commented 11 years ago

That was an awesome read, @raggi. We should probably do a better job in the documentation of dissuading people from using hostnames. IP addresses should be preferred whenever possible.

mperham commented 11 years ago

Hostname resolution caused severe performance issues in production for us too. Of course, now I see the readme already calls it out as a caveat.

reinh / statsd