ntop / n2n

Peer-to-peer VPN
GNU General Public License v3.0
6.33k stars 949 forks source link

add chachao20-ietf-poly1305 support #230

Closed xiamr closed 4 years ago

xiamr commented 4 years ago

There are two algorithms for encryption in n2n, twofish and AES-CBC. But it is not suitable for router CPU without AES-NI support. Chacha20-ietf-Poly1305, as well as a variant with an extended nonce: XChaCha20-ietf-Poly1305, both are secure and high performace for such situation and recommanded by libsodium library, so it is worth implementing for n2n.

xiamr commented 4 years ago

Could you support it ? Or I will consider to implement my own version.

Logan007 commented 4 years ago

I plan to implement a lightweight cipher for header encryption (#198). I could try to expand it to its own packet encryption transform – if supported.

For now, I have chosen SPECK, which is extremely simple to implement.

As for Poly1305, I am not sure if n2n can already bear a Message Authentication Code as this seems planned to be taken in a more general approach.

xiamr commented 4 years ago

You do not need implement cipher by your own, other libraries such as libsodium provides a friendly API for app development. Choosing a well known cipher is benefit for security, For header data, you can only use authentication without encryption. Authentication is essential when packet across untrusted network while firewall can change the content.

Logan007 commented 4 years ago

I just realized that openSSL supports ChaCha20 cipher as const EVP_CIPHER *EVP_chacha20(void).

Now that transform_aes.c supports openSSL's evp_* interface, testing should be just as easy as changing the cipher in lines 348, 355, and 364 accordingly. ChaCha20's fixed key size of 256 bits might require some additional, slight changes to the code. I would be very interested in your results, please share!

A performance comparison running openssl speed -evp chacha20 and openssl speed -evp aes-128-cbc makes ChaCha20 look promising even on AES-NI accelerated CPUs.

Concerning header encryption, I am led by the idea of implementing it as a feature which always is available (not necessarily enabled) and thus independent from external libraries – just like Twofish is the always-available option for payload encryption. So, I want to go with a lightweight cipher that is nearly effortless to implement and an implementation of which is easy to review. With a view to the official implementation guide of SPECK128 (chapter VI, p. 14), it seems extremely acceptable in this regard. Also, NSA recommends SPECK even for official use in case AES (also NSA-recommended) is either not available or just not feasible.

xiamr commented 4 years ago

I tested on my OpenWrt router by running openssl speed, the cpu is MIPS 24Kc without AES-NI support, single core.The results are:

     type         16 bytes        64 bytes     256 bytes    1024 bytes   8192 bytes  16384 bytes
  chacha20       11294.12k      18231.96k     20661.49k     21504.00k    21650.72k    21722.61k
chacha20-poly1305 6626.00k      11122.48k     13421.92k     14270.42k    14258.34k    14382.20k
   aes-128-cbc    5689.85k       7922.32k      8709.90k      8934.97k     8983.89k     9047.61k
   aes-128-gcm    3708.05k       4329.54k      4575.62k      4618.66k     4693.46k     4644.19k

So chacha20 is more faster than aes-128.

Logan007 commented 4 years ago

Thank you for sharing your results. It is the chaining in CBC encryption (not so much decryption) that offers less opportunities for vectorizing or parallel processing.

I am surprised that GCM performs worse than CBC. What about pure CTR mode openssl speed -evp aes-128-ctr ?

Logan007 commented 4 years ago

My curiosity got triggered... I have just had my Raspberry 3B+ (w/o any AES hardware acceleration) run the openSSL speed tests¹:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
chacha20         58390.24k   123716.17k   252347.76k   282826.35k   294059.25k   294196.25k
aes-128-cbc      42390.82k    57049.96k    62856.53k    64483.33k    64984.41k    65011.71k
aes-128-ctr      32710.49k    41929.27k    45543.94k    46500.18k    46787.24k    46637.25k

Now I feel your pain! Indeed, it is interesting to me that CTR seems to perform² worse than CBC. That is completely different from the ranking I saw on an i7 7500U CPU with AES-NI support:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-ctr     585483.37k  1875635.80k  3695942.72k  4944596.31k  5488765.61k  5535012.18k
chacha20        376218.05k   678838.39k  1400891.65k  2872871.59k  2985631.74k  3003631.05k
aes-128-cbc     906948.80k  1278785.63k  1313959.85k  1319719.59k  1321743.70k  1321768.28k

Given these numbers, the integration of some lightweight cipher into n2n as additional option definitely is worth considering. Now, the question is if to use openSSL or make it integral part of n2n (just like Twofish)? On the one hand, the use of openSSL does not sound lightweight in terms of a system-wide view. On the other hand, if openSSL is around anyway... Codewise, the integrated apporach surely would turn out more complicated for ChaCha20 than for SPECK...³

¹ I just compared the pure ciphers without the different MAC mechanisms yet I am aware of the possibility of bitwise attacks on stream ciphers – due to pad checking, CBC already provides a (very) light protection. However, n2n seems to go more in the direction of putting a MAC in a more general place, maybe the header, at a later point in time. ² The 1024 bytes row should matter most to n2n. ³ Neither pun nor political discussion intended.

xiamr commented 4 years ago

My test shows similiar trend with you. Here is my results, it worse for CTR mode than CBC:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-ctr       5484.21k     7132.77k     7754.65k     7927.19k     7926.88k     8005.27k

About the system wide avaiable problem for openssl, in my opinion, you can add conditional compilation option or use dlopen facility for optional linking to openssl accroding to the demand. The requirements to reduce the size of package is only happened for equipments with low storage space, such as embedded system. For high performance PC and server, the resource is adequate and any option is OK. Let users choose whether to intregrated into n2n or not.

Another way to improve performace is using multi-thread, but it will introduce too much complexity and is not effective for single core CPU.

I think n2n is mainly oriented to individuals, for enterprise, they will use hardware VPN. Thus I really recommand n2n to join the repository of openwrt, which is a Linux distribution for network router. I think many linux fans will love to use n2n.

Logan007 commented 4 years ago

I really like the idea of adding a lightweight cipher as optional additional cipher. If we just want to play it the easy way for now, we should use openSSL. We could just adopt a copy of transform_aes.c towards a new transform_chacha20.c and some other parts of the code.

It seems that ChaCha20 first got supported starting with openSSL 1.1.x. My guess is that, for full integration into upstream, we probably need to care of compatibility (Mac?). So, an alternative for use with openSSL 1.0.x. (see the various #ifdef OPENSSL_1_1 parts in transform_aes.c) should be provided. Any suggestions? Some separate slim-fit software implementation (maybe from here – not tested)? Maybe, that implementation already is speedy enough and we do not need openSSL?

xiamr commented 4 years ago

Using conditional compilation is fine for this feature. It means that when compiled with openssl 1.1.x and above, n2n has chacha20 support, or the default cipher is selected. As the time goes, linux distributions will bundled with new openssl verision, or the user can update openssl byself. Using other encryption libraries may take time to learn, and coping the separate slim-fit software implementation may have security problems(not tested by the industry). As far as I am concerned, it is fine to use openssl implementation because it is easy to update openssl (I alrealy done for benchmark).

Logan007 commented 4 years ago

Alright, I see. Before proceeding, we should get an opinion from the maintainers.

Logan007 commented 4 years ago

However, I was not able to hold myself back from starting to code... :smile: I will publish as patch or pull request during the next days for further consideration, discussion and review.

Logan007 commented 4 years ago

Please see pull request #235.

Compared to the AES code which obviously served as a basis, the following changes were applied:

I have just performed some very quick tests in a limited test environment without encountering any problems so far. Does it work for you?

Logan007 commented 4 years ago

I had run the numbers using tools/n2n-benchmark -d and want to share the results:

CPU AES Hw evp_aes_128_cbc evp_chacha20
Cortex A53s ⁿ/ₐ 24.7 MB/s 46.8 MB/s
Core i5 M430 ⁿ/ₐ 86.6 MB/s 260.6 MB/s
Celeron 3865U 241.4 MB/s 252.9 MB/s
i7 2860QM 311.4 MB/s 389.7 MB/s
i7 4770T 403.7 MB/s 705.3 MB/s
i7 5775C 405.9 MB/s 719.1 MB/s
i7 7500U 501.9 MB/s¹ 715.8 MB/s²

What about the MIPS 24Kc?

¹ ² Omitting -d delivers 756.1 MB/s and 1,311.9 MB/s respectively. While the ratio of the ChaCha20 rates (≈ 0.55, i.e. slightly above ½) might result from some benefit caused by cache hits, AES-CBC's value's ratio (≈ 0.66) should also reflect CBC's opportunities for faster de-cryption (compared to en-cryption only).

xiamr commented 4 years ago

The benchmark results from my MIPS 24Kc platform are(by running tools/n2n-benchmark -d):

Run enc/dec[transop_null] for 3s (512 bytes):   
          204530 packets        68.2 Kpps       34.9 MB/s
Run enc/dec[transop_twofish] for 3s (512 bytes):   
            8675 packets         2.9 Kpps        1.5 MB/s
Run enc/dec[transop_aes] for 3s (512 bytes):   
           19231 packets         6.4 Kpps        3.3 MB/s
Run enc/dec[transop_cc20] for 3s (512 bytes):   
           34298 packets        11.4 Kpps        5.9 MB/s

after omit -d option:

Run enc[transop_null] for 3s (512 bytes):   
          579123 packets       193.0 Kpps       98.8 MB/s
Run enc[transop_twofish] for 3s (512 bytes):   
           18043 packets         6.0 Kpps        3.1 MB/s
Run enc[transop_aes] for 3s (512 bytes):   
           40364 packets        13.5 Kpps        6.9 MB/s
Run enc[transop_cc20] for 3s (512 bytes):   
           70740 packets        23.6 Kpps       12.1 MB/s

It seems twofish chiper is the slowest one.

Logan007 commented 4 years ago

Great to see that it seems to work out for you. I hope that 12.1 MByte (≈ 100 MBit) per second are sufficient for your router's needs – at least in one direction.

Now, question is: How to enable it via cli? Not to interfere with the well known and existent cli-options -A and -k, we could just use an optional extension to -A:

¹ ...and also no other -A_ option around (see questions below)

A would be for "ciphA"... The numbering corresponds to typedef enum n2n_transform as found in n2n_transforms.h.

This scheme needs some more thoughts as I am not yet happy with how combinations of -A2, -A3 or -A4 without -k are covered. Should we allow to use an empty zero-key as it seems to have been the case with AES up to now? Or should we fall back to nulltransform? Shall n2n deny to accept any `-A` without key – that would be a behavioral change.

As this might turn out too complicated to explain to users, another alternative would be to completely change how to chose the cipher. Maybe some new -C_ option deprecating -A? That definitely would require maintainers' guidance.

Any thoughts?

As for Twofish, I have not had the opportunity to take a closer look at the implementation actually used in n2n – I mostly use AES anyway. There may be faster implementations out there or some optimization potential left in the code. I am not really able to say. I have just found some older speed comparison which suggests that Twofish does not necessarily has to be slower than software AES on packets sized around 1024 bytes – strongly implementation-dependent.

xiamr commented 4 years ago

About how to add new cli-options, I think some form of vote need to been held. In my opinion, add some new _-C_ option as a better alternative than -A_ option. Old approach must be exist for compatibility, until no longer needed.

Logan007 commented 4 years ago

I updated pull request #235. Notably, a command-line option to enable ChaCha20 was added ( -A4). Along with it, the scheme how to use the -A option was extended, maintaining n2n's behavioural compatibility to current dev:

-A1: do not use any payload encryption (transform_null, this is default if neither key nor any -A_ option is given) -A2: use Twofish (default, if a key is given and none of the -A_ options is given) -A3 or -A: use AES-CBC -A4: use ChaCha20

Solitary use of -A is marked deprecated and should be removed in some future release. Note: It is not the AES-CBC cipher that gets removed, only the way it gets enabled: just use -A3 instead. With a broader choice of ciphers, the need for a more general approach to explicitly choose the cipher becomes evident.

As for Poly1305: n2n will soon get a mechanism to verify payload integrity. Until then, just be aware that this is not covered yet. Stay tuned.

A final note: Up to now, n2n uses C's rand() function for random numbers (so initialization vectors are not random enough for proper use with ChaCha20). A new random number generator will be implemented very soon. Until then, ChaCha20 should not be used in productive networks yet.

Logan007 commented 4 years ago

@xiamr May I ask you off-topic: What compiler do you use to build n2n for your 32-bit MIPS CPU? And also, do you experience any difficulties compiling the (still optional) _uint64t type which gets more and more commonly used (also in that ChaCha20)?

xiamr commented 4 years ago

I use gcc 7.3, which is the cross toolchain of openwrt SDK . It is fine to use any standard feature of C/C++, and I have no difficulty to compile the program.

lucaderi commented 4 years ago

Implemented via pull request