Increasing compression ratio (idea development)

ntop / n2n

Peer-to-peer VPN

GNU General Public License v3.0

6.18k stars 930 forks source link

Increasing compression ratio (idea development) #949

Open bluesky2319 opened 2 years ago

bluesky2319 commented 2 years ago

First of all, thank you for supporting compression in n2n. I did some tests by enabling compression on a VPN tunnel. My test file was enwik9 (uncompressed text) and the internet bandwidth was around 5 Mbits/s:

This is my result: (MTU is 1400 bytes) with L19_zstd compression: 16 min 8 sec (968 sec) without compression: 24 min 28 sec (1468 sec)

It means the compression ratio for this file is: (968 sec)/(1468 sec) = 66%

I saw there is a significant difference between data packet compression and file compression. The compression ratio for enwik9 on level 7 is 29%. It was predictable because the maximum data packet (or MTU) size is 1500 bytes.

For comparison, I split enwik9 into small files. The file size of each split is 1400 bytes. Then I compressed these files, and the compression ratio was 52%.

Now there is a difference between these two compression ratios, 66% on data packet compression with MTU 1500 bytes and 52% on split file compression. One reason is MTU represents maximum transmission size, and I guess many data packets are less than MTU and the compression doesn't work well. Maybe there are some other reasons!

I have an idea about compressing data packets more than this. Instead of compressing data packets one by one, we can collect data packets (e.g. we collect them for 1 second) and create a big data packet (or temporary file) by gathering data packets and compressing them (the created big data packet), then forward it to the tunnel. Big data packet size will be in MB (it depends on the Internet bandwidth) and compressing algorithms compress it very well, much better than data packets with MTU 1500. In my test with 5 Mbits/s Internet speed, I noticed the maximum packet forwarding in the network was 1.6 kbps. For example, by collecting 1400 packets per sec, we will have a big data packet with more than 1 MB, and by compressing it we will reach a better compression result (maybe <35% compression ratio for enwik9). In this model, we will have delays on start.

Adding this option into n2n will be a great feature for whom wants more compression on the network and It will save bandwidth consumption for uncompressed files! For doing this as @Logan007 said issue #539 will be solved first! This idea needs to be developed and implemented into n2n.

I would appreciate your support to add this feature to n2n.

huddhudd commented 2 years ago

I think this is a good idea. Both server and client collect data packets, and then partition and index them. Finally, maintain the index ID, so that the same package only needs to send an index.

This is almost the same as caching

Logan007 commented 2 years ago

I actually was thinking of buffering a fixed number of packets and then indeed assemble them and send them as a new packet type, i.e. a packet burst which underwent a common compression step. The maximum size of collected packet data was the maximum underlying UDP transmissible data size, around 64K.

However, caching increases delay and requires additional code. But why do we need to collect packets? As we actually have a virtual LAN, we can determine the packet size (MTU) ourselves, making our own "jumbo frames".

So, I tried to change a few parameters in include/n2n_define.h, namely

#define N2N_SN_PKTBUF_SIZE   65535 
#define N2N_PKT_BUF_SIZE     65535
#define DEFAULT_MTU          61000

Compiling with zstd support, transferring the 100 MB enwik8 through the n2n tunnel using netcat, and following compression output using -z2 -vvvvv at the sending edge along with a | grep compr gives a good indicator on how effective compression is.

If you experiment with smaller DEFAULT_MTU values such as 4096, 8192 or 16384, you will find that the ZSTD_COMPRESSION_LEVEL define (defaulting to 7) does not seem to have much impact – higher values do not give much better compression here because achievable compression is much more limited by the buffer size. On that higher MTU of 61000 however it makes a more visible difference (between 7 and maximum 22).

If I am not misled here, collecting packets will not allow for better compression than the approach of higher MTU. Also, the collector approach brings more complexity and delay. So, I am not convinced that the collector approach would be necessarily of help.

Please let me know what you think. And please share your results from experimenting with higher MTU.

Logan007 commented 2 years ago

I get the following compression ratios when transmitting enwik8 file of 100,000,000 bytes size:

`DEFAULT_MTU`	(default) ZSTD Level 7	ZSTD Level 22	LZO1x
(default) 1,290	65.0 %	64.9 %	86.1 %
2,048	57.6 %	56.9 %	78.3 %
4,096	50.1 %	49.4 %	69.9 %
8,192	45.4 %	44.6 %	64.4 %
16,384	42.1 %	41.1 %	60.5 %
32,768	39.7 %	38.5 %	57.4 %
61,000	38.4 %	36.9 %	57.4 %

Note that compression speed might be very low on the larger MTU – depending on your hardware of course. Also keep in mind that DEFAULT_MTU should not exceed 61,000 bytes as LZO (-z1 – in case it gets used) is not able to work properly anymore due to buffer reserve requirements.

Sender

pv enwik8  | netcat <receiver_n2n_ip_address> 2222

Receiver

netcat -l -p 2222 > enwik8.out

The compression ratio was calculated by counting the number of actually transmitted bytes (additional experimental counter in edge).

bluesky2319 commented 2 years ago

@Logan007 Thank you so much for your suggestion. I tested enwik9 based on your recommendation. I achieved better ratio than before in this structure. The ratio is 38.5%. It takes 7 min 49 seconds on level 7 with 5 Mbit/s Internet speed, that's great achievement on local storage (between edges). But when I try to download enwik9 from other sources (out-side of local storage (edge), like downloading from other websites/servers) with using n2n as a classic VPN tunnel, the result is different, the ratio is 66% for enwik9. I guess when we download it from external sources, we receive data packets with MTU=1500 bytes, and because of that the compression ratio is not good in the VPN tunnel. Is there any solution to solve this issue, like you offered here for edge to edge, by changing MTU and buffer size in this model, or we need to collect data packets and then compress them?

Logan007 commented 2 years ago

by changing MTU and buffer size in this model

I doubt that you will be able to switch to a higher MTU on you internet-bound device ip link set <device_name> mtu 61000 will most likely stop you from doing it and show the mtu greater than device maximum error.

we need to collect data packets and then compress them?

If you want more compression of compressible data such as text – the answer is yes.

Basically, you would need to add buffer space to each peer structure plus a pointer/counter indicating the collected number of packets so far, add new packets to buffer, check if buffer full and eventually flush, regularly check all peers (main loop) and flush even half-empty buffers. Transmitting a collection of packets would also require the implementation of a new message type plus handling. Logic gets more complicated and memory requirements raise.

As buffering would cause some transmission delay, you would also want to add a new command line option enabling this behavior only on special request.

Is there any solution to solve this issue

I still do no think it is an issue. I would think of it as an enhancement.

the compression ratio is not good in the VPN tunnel

Please keep in mind that n2n is a layer 2 VPN. I am not sure if frames transmitted on layer 2 should take care of compressing longer data streams from higher layers – probably layer 7 in your case. Shouldn't the data streams be compressed on the application layer then by something what I would call a compressing http(s) proxy or so (I do not know if such a thing exists)?

bluesky2319 commented 2 years ago

@Logan007 thank you. Yes, compressing data in the application layer is much better than layer 2, but unfortunately, I don't have direct access to the files, and because of that I should compress it in the middle (tunnel) of transitions. That's right, this works for uncompressed files. The ratio in edge to edge is a very good result in this scenario. Now based on this result I think with gathering data packets in 60KB I can achieve a better compression ratio in any case in the tunnel, no need for size in MB!

bluesky2319 commented 2 years ago

Packet aggregation is used in some switches for collecting small data packets into big ones (MTU 1500). The difference with the idea is in MTU size. For better compression ratio and considering latency in aggregation better to choose MTU 8,192 to 16384 in the aggregation. There are some projects about it, that will be helpful in the development.

https://github.com/mikaelhedegren/packet-aggregation-in-linux https://www.diva-portal.org/smash/get/diva2:5705/FULLTEXT01.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.555.2968&rep=rep1&type=pdf

https://tools.ietf.org/id/draft-saldana-tsvwg-simplemux-07.html https://github.com/Simplemux/lispmob-with-simplemux https://www.slideshare.net/josemariasaldana/simplemux-a-generic-multiplexing-protocol