Open inercia opened 9 years ago
We would need to see evidence that this provides some tangible benefits in realistic use cases.
Implementation wise this shouldn't be too hard. The main potential stumbling blocks I can see are interaction with PMTU discovery and packet coalescing.
Would we compress per captured packet or per UDP packet sent over the network (the latter potentially containing multiple packets, due to the aforementioned coalescing)? The former is more efficient for multi-hop since intermediaries don't need to decompress/recompress. The latter is likely to give better compression ratios.
Regarding compression algorithms...Ideally I would avoid making that yet another option, i.e. hopefully we can pick one that is good enough across a broad spectrum of use cases.
As you said, it is not clear to me what to compress: the captured packet or the UDP packet. I guess it would depend on the average number of intermediate hops: if packets usually traverse a couple of hops, maybe UDP packets should be compressed...
I agree that compression should not be an user option. In fact, I think I would enable it by default and apply an adaptative algorithm that could switch it off. I would apply the same technique proposed in this RFC (even when it is focused on IP payloads, it gives some interesting hints on packets compression), where they propose:
1) for a (source, destination) packet, try to compress the payload and compare the output lenght with the original length 2) if things are getting worse after compression for N consecutive packets, disable compression for a while. 3) if we have disabled compression too many times for (source, destination), disable it for good
I think I would add a 2-bits field to the UDP packet for indicating possible compression schemes for the packet (ie, no compression, algorithm-1, etc). Ideally, Weave should have at least two possible compression schemes: low-CPU and high-CPU compressions. high-CPU would be used when the number of active, compressed connections is below a given threshold-1. Above that value, new connections would use the low-CPU algorithm, and compression could be even disabled completely for new connections when a threshold-2 is reached...
enable it by default and apply an adaptative algorithm that could switch it off
Nice idea, but it makes this issue an order of magnitude more complex. So best left to a follow-on.
low-CPU and high-CPU compressions
In most deployments, weave is going to be CPU-bound, so I reckon low-CPU compression is all we need.
As for how we determine whether to compress/decompress...
weave launch --compressed
, a router will compress all outbound traffic. The flag gets exchanged with peers on connection establishment, so they know which inbound udp packets require decompression based on that. Alternatively, we can add a flag to the udp packet itself, which is less fragile and more amenable to packet inspection.I think that, ideally, compression should be per link, as it depends on where packets go to/come from. For example, it does not make sense to compress packets going to peers in the same LAN, but compressing packets that go to the WAN could be a big performance gain.
So I would leave the option to the user for enabling compression (with weave launch --enable-compress
or something like that), but I would leave to Weave the decision of where/when to use it...
it does not make sense to compress packets going to peers in the same LAN
We don't know that. With compression, less data is crossing between kernel and user space.
should be per link [...] I would leave the option to the user for enabling compression (with
weave launch --enable-compress
or something like that)
If compression is per-link then I don't see the point of enabling it per host. See my option 3, in particular the zones idea.
Not sure about the compression benefits for userspace-kernel copies. If we could measure the cost of this kind of operations, I would bet most of the cost would come from the syscall, and only only a small fraction would depend on the data length (unless you are moving big chunks of data)... not sure, though... I guess it would depend on the nature of the traffic
Regarding issue #82, it would be a good addition, but I think it would involve some difficulties. I will write some questions in that issue...
Not sure about the compression benefits for userspace-kernel copies
Me neither. Experiments/measurements of where compression yields benefits is very much part of this issue.
Maybe Weave could support data compression for traffic. Encapsulation packets could include a compression field where we could specify an optional compression algorithm for payload. The standard compress/lzw algorithm could be used, or maybe something like lz4... A feature like this could be specially helpful for containers running text-based services like Memcache, Redis, etc...