Closed xiamr closed 4 years ago
Could you support it ? Or I will consider to implement my own version.
I plan to implement a lightweight cipher for header encryption (#198). I could try to expand it to its own packet encryption transform – if supported.
For now, I have chosen SPECK, which is extremely simple to implement.
As for Poly1305, I am not sure if n2n can already bear a Message Authentication Code as this seems planned to be taken in a more general approach.
You do not need implement cipher by your own, other libraries such as libsodium provides a friendly API for app development. Choosing a well known cipher is benefit for security, For header data, you can only use authentication without encryption. Authentication is essential when packet across untrusted network while firewall can change the content.
I just realized that openSSL supports ChaCha20 cipher as const EVP_CIPHER *EVP_chacha20(void)
.
Now that transform_aes.c
supports openSSL's evp_*
interface, testing should be just as easy as changing the cipher in lines 348
, 355
, and 364
accordingly. ChaCha20's fixed key size of 256 bits might require some additional, slight changes to the code. I would be very interested in your results, please share!
A performance comparison running openssl speed -evp chacha20
and openssl speed -evp aes-128-cbc
makes ChaCha20 look promising even on AES-NI accelerated CPUs.
Concerning header encryption, I am led by the idea of implementing it as a feature which always is available (not necessarily enabled) and thus independent from external libraries – just like Twofish is the always-available option for payload encryption. So, I want to go with a lightweight cipher that is nearly effortless to implement and an implementation of which is easy to review. With a view to the official implementation guide of SPECK128 (chapter VI, p. 14), it seems extremely acceptable in this regard. Also, NSA recommends SPECK even for official use in case AES (also NSA-recommended) is either not available or just not feasible.
I tested on my OpenWrt router by running openssl speed, the cpu is MIPS 24Kc without AES-NI support, single core.The results are:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
chacha20 11294.12k 18231.96k 20661.49k 21504.00k 21650.72k 21722.61k
chacha20-poly1305 6626.00k 11122.48k 13421.92k 14270.42k 14258.34k 14382.20k
aes-128-cbc 5689.85k 7922.32k 8709.90k 8934.97k 8983.89k 9047.61k
aes-128-gcm 3708.05k 4329.54k 4575.62k 4618.66k 4693.46k 4644.19k
So chacha20 is more faster than aes-128.
Thank you for sharing your results. It is the chaining in CBC encryption (not so much decryption) that offers less opportunities for vectorizing or parallel processing.
I am surprised that GCM performs worse than CBC. What about pure CTR mode openssl speed -evp aes-128-ctr
?
My curiosity got triggered... I have just had my Raspberry 3B+ (w/o any AES hardware acceleration) run the openSSL speed tests¹:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
chacha20 58390.24k 123716.17k 252347.76k 282826.35k 294059.25k 294196.25k
aes-128-cbc 42390.82k 57049.96k 62856.53k 64483.33k 64984.41k 65011.71k
aes-128-ctr 32710.49k 41929.27k 45543.94k 46500.18k 46787.24k 46637.25k
Now I feel your pain! Indeed, it is interesting to me that CTR seems to perform² worse than CBC. That is completely different from the ranking I saw on an i7 7500U CPU with AES-NI support:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-ctr 585483.37k 1875635.80k 3695942.72k 4944596.31k 5488765.61k 5535012.18k
chacha20 376218.05k 678838.39k 1400891.65k 2872871.59k 2985631.74k 3003631.05k
aes-128-cbc 906948.80k 1278785.63k 1313959.85k 1319719.59k 1321743.70k 1321768.28k
Given these numbers, the integration of some lightweight cipher into n2n as additional option definitely is worth considering. Now, the question is if to use openSSL or make it integral part of n2n (just like Twofish)? On the one hand, the use of openSSL does not sound lightweight in terms of a system-wide view. On the other hand, if openSSL is around anyway... Codewise, the integrated apporach surely would turn out more complicated for ChaCha20 than for SPECK...³
¹ I just compared the pure ciphers without the different MAC mechanisms yet I am aware of the possibility of bitwise attacks on stream ciphers – due to pad checking, CBC already provides a (very) light protection. However, n2n seems to go more in the direction of putting a MAC in a more general place, maybe the header, at a later point in time. ² The 1024 bytes row should matter most to n2n. ³ Neither pun nor political discussion intended.
My test shows similiar trend with you. Here is my results, it worse for CTR mode than CBC:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-ctr 5484.21k 7132.77k 7754.65k 7927.19k 7926.88k 8005.27k
About the system wide avaiable problem for openssl, in my opinion, you can add conditional compilation option or use dlopen facility for optional linking to openssl accroding to the demand. The requirements to reduce the size of package is only happened for equipments with low storage space, such as embedded system. For high performance PC and server, the resource is adequate and any option is OK. Let users choose whether to intregrated into n2n or not.
Another way to improve performace is using multi-thread, but it will introduce too much complexity and is not effective for single core CPU.
I think n2n is mainly oriented to individuals, for enterprise, they will use hardware VPN. Thus I really recommand n2n to join the repository of openwrt, which is a Linux distribution for network router. I think many linux fans will love to use n2n.
I really like the idea of adding a lightweight cipher as optional additional cipher. If we just want to play it the easy way for now, we should use openSSL. We could just adopt a copy of transform_aes.c
towards a new transform_chacha20.c
and some other parts of the code.
It seems that ChaCha20 first got supported starting with openSSL 1.1.x. My guess is that, for full integration into upstream, we probably need to care of compatibility (Mac?). So, an alternative for use with openSSL 1.0.x. (see the various #ifdef OPENSSL_1_1
parts in transform_aes.c
) should be provided. Any suggestions? Some separate slim-fit software implementation (maybe from here – not tested)? Maybe, that implementation already is speedy enough and we do not need openSSL?
Using conditional compilation is fine for this feature. It means that when compiled with openssl 1.1.x and above, n2n has chacha20 support, or the default cipher is selected. As the time goes, linux distributions will bundled with new openssl verision, or the user can update openssl byself. Using other encryption libraries may take time to learn, and coping the separate slim-fit software implementation may have security problems(not tested by the industry). As far as I am concerned, it is fine to use openssl implementation because it is easy to update openssl (I alrealy done for benchmark).
Alright, I see. Before proceeding, we should get an opinion from the maintainers.
However, I was not able to hold myself back from starting to code... :smile: I will publish as patch or pull request during the next days for further consideration, discussion and review.
Please see pull request #235.
Compared to the AES code which obviously served as a basis, the following changes were applied:
I have just performed some very quick tests in a limited test environment without encountering any problems so far. Does it work for you?
I had run the numbers using tools/n2n-benchmark -d
and want to share the results:
CPU | AES Hw | evp_aes_128_cbc |
evp_chacha20 |
---|---|---|---|
Cortex A53s | ⁿ/ₐ | 24.7 MB/s | 46.8 MB/s |
Core i5 M430 | ⁿ/ₐ | 86.6 MB/s | 260.6 MB/s |
Celeron 3865U | ✓ | 241.4 MB/s | 252.9 MB/s |
i7 2860QM | ✓ | 311.4 MB/s | 389.7 MB/s |
i7 4770T | ✓ | 403.7 MB/s | 705.3 MB/s |
i7 5775C | ✓ | 405.9 MB/s | 719.1 MB/s |
i7 7500U | ✓ | 501.9 MB/s¹ | 715.8 MB/s² |
What about the MIPS 24Kc?
¹ ² Omitting -d
delivers 756.1 MB/s and 1,311.9 MB/s respectively. While the ratio of the ChaCha20 rates (≈ 0.55, i.e. slightly above ½) might result from some benefit caused by cache hits, AES-CBC's value's ratio (≈ 0.66) should also reflect CBC's opportunities for faster de-cryption (compared to en-cryption only).
The benchmark results from my MIPS 24Kc platform are(by running tools/n2n-benchmark -d):
Run enc/dec[transop_null] for 3s (512 bytes):
204530 packets 68.2 Kpps 34.9 MB/s
Run enc/dec[transop_twofish] for 3s (512 bytes):
8675 packets 2.9 Kpps 1.5 MB/s
Run enc/dec[transop_aes] for 3s (512 bytes):
19231 packets 6.4 Kpps 3.3 MB/s
Run enc/dec[transop_cc20] for 3s (512 bytes):
34298 packets 11.4 Kpps 5.9 MB/s
after omit -d option:
Run enc[transop_null] for 3s (512 bytes):
579123 packets 193.0 Kpps 98.8 MB/s
Run enc[transop_twofish] for 3s (512 bytes):
18043 packets 6.0 Kpps 3.1 MB/s
Run enc[transop_aes] for 3s (512 bytes):
40364 packets 13.5 Kpps 6.9 MB/s
Run enc[transop_cc20] for 3s (512 bytes):
70740 packets 23.6 Kpps 12.1 MB/s
It seems twofish chiper is the slowest one.
Great to see that it seems to work out for you. I hope that 12.1 MByte (≈ 100 MBit) per second are sufficient for your router's needs – at least in one direction.
Now, question is: How to enable it via cli? Not to interfere with the well known and existent cli-options -A
and -k
, we could just use an optional extension to -A
:
-A1
[or] no -k
option at all¹: Null-A2
[or] -k
only: Twofish-A3
[or] -A
: AES-A4
: ChaCha20¹ ...and also no other -A_
option around (see questions below)
A
would be for "ciphA"... The numbering corresponds to typedef enum n2n_transform
as found in n2n_transforms.h
.
This scheme needs some more thoughts as I am not yet happy with how combinations of -A2
, -A3
or -A4
without -k
are covered. Should we allow to use an empty zero-key as it seems to have been the case with AES up to now? Or should we fall back to nulltransform? Shall n2n deny to accept any `-A` without key – that would be a behavioral change.
As this might turn out too complicated to explain to users, another alternative would be to completely change how to chose the cipher. Maybe some new -C_
option deprecating -A
? That definitely would require maintainers' guidance.
Any thoughts?
As for Twofish, I have not had the opportunity to take a closer look at the implementation actually used in n2n – I mostly use AES anyway. There may be faster implementations out there or some optimization potential left in the code. I am not really able to say. I have just found some older speed comparison which suggests that Twofish does not necessarily has to be slower than software AES on packets sized around 1024 bytes – strongly implementation-dependent.
About how to add new cli-options, I think some form of vote need to been held. In my opinion, add some new _-C_ option as a better alternative than -A_ option. Old approach must be exist for compatibility, until no longer needed.
I updated pull request #235. Notably, a command-line option to enable ChaCha20 was added ( -A4
). Along with it, the scheme how to use the -A
option was extended, maintaining n2n's behavioural compatibility to current dev:
-A1
: do not use any payload encryption (transform_null
, this is default if neither key nor any -A_
option is given)
-A2
: use Twofish (default, if a key is given and none of the -A_
options is given)
-A3
or -A
: use AES-CBC
-A4
: use ChaCha20
Solitary use of -A
is marked deprecated and should be removed in some future release. Note: It is not the AES-CBC cipher that gets removed, only the way it gets enabled: just use -A3
instead. With a broader choice of ciphers, the need for a more general approach to explicitly choose the cipher becomes evident.
As for Poly1305: n2n will soon get a mechanism to verify payload integrity. Until then, just be aware that this is not covered yet. Stay tuned.
A final note: Up to now, n2n uses C's rand()
function for random numbers (so initialization vectors are not random enough for proper use with ChaCha20). A new random number generator will be implemented very soon. Until then, ChaCha20 should not be used in productive networks yet.
@xiamr May I ask you off-topic: What compiler do you use to build n2n for your 32-bit MIPS CPU? And also, do you experience any difficulties compiling the (still optional) _uint64t type which gets more and more commonly used (also in that ChaCha20)?
I use gcc 7.3, which is the cross toolchain of openwrt SDK . It is fine to use any standard feature of C/C++, and I have no difficulty to compile the program.
Implemented via pull request
There are two algorithms for encryption in n2n, twofish and AES-CBC. But it is not suitable for router CPU without AES-NI support. Chacha20-ietf-Poly1305, as well as a variant with an extended nonce: XChaCha20-ietf-Poly1305, both are secure and high performace for such situation and recommanded by libsodium library, so it is worth implementing for n2n.