I discovered that the way captcp hashes/identifies flows in order to associate them with connections is flawed.
Connections and subflows are identified by a kind of unique identifier/hash, uid computed in TcpConn.__init__ beginning at line 2273:
l = [ord(a) ^ ord(b) for a,b in zip(self.sipnum, self.dipnum)]
self.uid = "%s:%s:%s" % (
str(self.ipversion),
str(l),
str(long(self.sport) + long(self.dport)))
This identifier is used in various places as an index/key into hashtables (well, dicts :) ) containing connections, and is notably used for associating subflows (uni-directional) with the connection (bi-directional) they are part of. For this purpose, the hash needs to be the same for any given port-pair between the same two addresses, regardless of source/destination ordering. I suppose this is why they are added together, since this is a commutative operation.
The problem is that uid is used as a direct key into tables, without checking for hash collisions. The above function is obviously not a perfect hash function, and will produce collisions if you are unlucky enough to have two connections with port pairs that add up to the same thing, which corrupts the data for those connections.
I am proposing a patch to rectify this by concatenating the (hex string representations of the) ordered port pair together instead of summation. This preserves the commutativity, but ensures we will not get collisions for the port pair part of the identifier. Note that the hash function overall still isn't perfect, you could get collisions between the IP address pairs, but that is perhaps slightly less likely.
Best wishes,
Kristian
PS. If you wonder how I came across this, I managed to trigger this bug by starting a bunch of connections to sequentially increasing port numbers almost simultaneously under FreeBSD. When FreeBSD has to generate a lot of port numbers in a very short time, it temporarily switches ephemeral port selection strategy to simple sequential incrementation, see: https://www.cymru.com/jtk/misc/ephemeralports.html
Sometimes a pair of connect() calls would get inversed and I ended up with port pairs such as
Hi,
I discovered that the way captcp hashes/identifies flows in order to associate them with connections is flawed.
Connections and subflows are identified by a kind of unique identifier/hash,
uid
computed inTcpConn.__init__
beginning at line 2273:This identifier is used in various places as an index/key into hashtables (well, dicts :) ) containing connections, and is notably used for associating subflows (uni-directional) with the connection (bi-directional) they are part of. For this purpose, the hash needs to be the same for any given port-pair between the same two addresses, regardless of source/destination ordering. I suppose this is why they are added together, since this is a commutative operation.
The problem is that
uid
is used as a direct key into tables, without checking for hash collisions. The above function is obviously not a perfect hash function, and will produce collisions if you are unlucky enough to have two connections with port pairs that add up to the same thing, which corrupts the data for those connections.I am proposing a patch to rectify this by concatenating the (hex string representations of the) ordered port pair together instead of summation. This preserves the commutativity, but ensures we will not get collisions for the port pair part of the identifier. Note that the hash function overall still isn't perfect, you could get collisions between the IP address pairs, but that is perhaps slightly less likely.
Best wishes, Kristian
PS. If you wonder how I came across this, I managed to trigger this bug by starting a bunch of connections to sequentially increasing port numbers almost simultaneously under FreeBSD. When FreeBSD has to generate a lot of port numbers in a very short time, it temporarily switches ephemeral port selection strategy to simple sequential incrementation, see: https://www.cymru.com/jtk/misc/ephemeralports.html
Sometimes a pair of
connect()
calls would get inversed and I ended up with port pairs such as