zeromq / libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1
https://www.zeromq.org
Mozilla Public License 2.0
9.67k stars 2.35k forks source link

Feature Request: Stream Compression for zmq over TCP #4320

Open pizzard opened 2 years ago

pizzard commented 2 years ago

Use Case

We use zmq heavily to transmit high-volume application data in a wide range of applications. While zmq has, in general, performed well in a lot of these, there is one particular case where we found zmq performance as relatively poor.

This particular use case uses classical pub/sub over proxy via TCP. In this case, we have some clients and the zmq-proxy on one device (let's say in a remote location in Australia) and some other clients on another device very remotely (let's say Germany). The connection between these goes over many hops, LTE networks and further. It is a stable connection; the bandwidth is limited but reliable, latency is over 100ms.

As CPU cycles are comparatively free (compared to the cost of getting more bandwidth), this is a strong case for compression. So we try to compress our messages wherever feasible and use concise topic names to squeeze bandwidth out. We also have packet compression on the frame level in the connection.

Performance Observations

After doing all that, zmq works to some extent, and we accepted the performance as given. As this caused some pain for our developers, they experimented with writing a simple TCP socket, putting their raw data in there, and they ended up outperforming zmq massively. If I use such an established library, I expected the performance of zmq to be at least in the same order of magnitude. So we looked into the scenario and found a clear difference. He routed this TCP connection over an ssh port forwarding, and this ssh connection had a simple stream compression on the TCP traffic active.

We pulled the zmq connection through an ssh connection with stream compression active to make it fair. In this fair comparison, zmq performed reasonable well again. The other thing is, presumably, with a proper stream compression, most of the above optimizations are pointless, as the stream compression will do them as well or better. Significant downsides of this compression are then the latency becomes more unstable, as there will be blocking. For most scenarios that generate sufficient traffic overall, this is entirely unobservable (compared to my latency baseline, it is negligible anyway).

For now, we know to open a compression activated ssh tunnel and pull zmq per TCP through that tunnel if we want decent performance. This setup is a lot of hassle and is definitively suboptimal, as I don't need encryption (already in an encrypted channel) and need a port setup outside of zmq.

Possible fixes within zmq

One fix I can think of is to add another protocol to zmq, e.g. called zstd-tcp, so one says zstd-tcp://192.168.2.1:4785/ instead of just tcp://.... This protocol would take the bytes and drop them into a zstd stream compressor. When the zstd compressor emits a block, it sends it via TCP. For good results, one might want to deactivate Nagles Algorithm on the TCP and chunk the data before the compression or through the compressor.

As the state of a stream-compressor is not limited to the context of a single message but can keep history over the whole stream, it would by far outperform any possible message compression. It would compress topic names away very effectively and likely outperform our base case of ssh compression wrapping by far.

As zmq abstracts away TCP connection handling, I do not know how to add this from the outside, so this might need to be added to zmq itself. Another option might be to have a socket option on TCP for compression, which sends the whole message stream through a stream compressor to hand it over to TCP.

bluca commented 2 years ago

There's already compression support in czmq's APIs, eg: https://github.com/zeromq/czmq/blob/master/api/zstr.api#L87

pizzard commented 2 years ago

Good to know. As far I can see, this does use lz4 to compress the given messages. It would be nice to replace our manual compression of messages with the use of that API, or at least experiment in that direction. So this might simplify things but does not materially change the situation.

When talking about compression, the relevant metric is entropy. Message compression makes sense to increase entropy within a message. If the message is already of high entropy, lz4 will not do anything.

In a stateless protocol, there is often a very low entropy over a chunk of messages, as topic names repeat. Also, the contents of these values do only change slightly, especially if messages are sent with a somewhat higher frequency. And as it is core to zmq design to abstract away when communication partners join and leave, the only way to make sure the state is propagated correctly is by repeating it.

I am trying to find a way to perform better with limited bandwidth by increasing the entropy of the TCP byte stream, which will be very low, even when the entropy for every single message is already maximal. In any library that allows me to do connection and socket handling myself, that's not hard to do, but zmq does abstracts these things away.