rockchip-linux / kernel

BSP kernel source
Other
912 stars 1.07k forks source link

RK3288 Crypto performance #87

Open rfrht opened 6 years ago

rfrht commented 6 years ago

I have built a kernel backporting 3288 Crypto accelerator (using a TinkerBoard) and I'm puzzled about how do I take advantage of the Crypto Engine.

I have done some OpenSSL tests and both shows no deviation from both with and without the module. See:

=== TESTING WITH THE ENGINE ===

[root@tinkerboard cron.d]# modprobe rk_crypto
[root@tinkerboard cron.d]# openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 7134953 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 64 size blocks: 2990888 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 256 size blocks: 778812 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 195865 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 24452 aes-256 cbc's in 3.00s
OpenSSL 1.0.1t  3 May 2016
built on: Thu Mar 29 12:42:31 2018
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr)
compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256 cbc      38180.35k    63805.61k    66458.62k    66855.25k    66770.26k

=== REMOVING THE ENGINE ===

[root@tinkerboard cron.d]# modprobe -r rk_crypto
[root@tinkerboard cron.d]#
[root@tinkerboard cron.d]#
[root@tinkerboard cron.d]# openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 7014728 aes-256 cbc's in 2.98s
Doing aes-256 cbc for 3s on 64 size blocks: 2975590 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 256 size blocks: 773596 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 195292 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 24386 aes-256 cbc's in 2.99s
OpenSSL 1.0.1t  3 May 2016
built on: Thu Mar 29 12:42:31 2018
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr)
compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256 cbc      37662.97k    63479.25k    66013.53k    66659.67k    66812.75k

=== NOW TRYING A DIFFERENT OPENSSL TEST. STILL WITHOUT ENGINE ===

[root@tinkerboard cron.d]#
[root@tinkerboard cron.d]#
[root@tinkerboard cron.d]# openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 6525393 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 2932580 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 770173 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 195867 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 24496 aes-256-cbc's in 3.00s
OpenSSL 1.0.1t  3 May 2016
built on: Thu Mar 29 12:42:31 2018
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr)
compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc      34802.10k    62561.71k    65721.43k    66855.94k    66890.41k

=== RE-INSERTING THE ENGINE ===

[root@tinkerboard cron.d]#
[root@tinkerboard cron.d]# modprobe rk_crypto
[root@tinkerboard cron.d]#
[root@tinkerboard cron.d]# openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 6918935 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 2939849 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 771348 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 195315 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 24436 aes-256-cbc's in 3.00s
OpenSSL 1.0.1t  3 May 2016
built on: Thu Mar 29 12:42:31 2018
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr)
compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc      36900.99k    62716.78k    65821.70k    66667.52k    66726.57k

The test results are pretty much the same with and without the module.

Module information:

[root@tinkerboard cron.d]# modinfo rk_crypto
filename:       /lib/modules/4.4.120-beleza-10-rockchip/kernel/drivers/crypto/rockchip/rk_crypto.ko
license:        GPL
description:    Support for Rockchip's cryptographic engine
author:         Zain Wang <zain.wang@rock-chips.com>
alias:          of:N*T*Crockchip,rk3288-crypto*
depends:        
intree:         Y
vermagic:       4.4.120-beleza-10-rockchip SMP mod_unload ARMv7 p2v8

Any thoughts?

kernle32dll commented 6 years ago

Just wanting to weigh in here with some of my findings (using OpenSSL 1.0.2o and a Rock64 with RK3328).

For the past few weeks, I also struggled to access said module. I can confirm @rfrht s numbers, as in "whether or not the module is loaded does not make a difference".

After digging trough some ancient forum posts, it seem'd that openssl in particular would (should?) not be not able to access the module, as it resides in a different space (kernel space vs user space). So I gave cryptodev a shot. Build the kernel module, and rebuild openssl with cryptodev support. Still - nothing (at least - no difference with or without rk_crypto). However, cryptodev did something funny which is still interesting to consider:

Running the openssl speed -evp aes-128-cbc -engine cryptodev yielded worse results for 16 and 64 byte ciphers, but much better results for cipher sizes 256 and up (we are talking 2500% improvement for 8192 ciphers here). Still, as things like openssh tend to use smaller ciphers, performance overall was worse with cryptodev. So what exactly is cryptodev doing here?

So, I then gave OpenSSL 1.1.0 a shot, as I read that some armv8 improvements were done, which should render the crypto module superfluous. But nope - regardless if using cryptodev or not, performance was - yet again - worse (this time all ciphers were worse).

So, this leaves me with a lot of question. It seems the kicker is that - even after several weeks experimenting - I was unable to properly access the module rk crypto module. So, anyone has any idea how to do so? Anything I missed?

noloader commented 5 years ago

I'm in the research stage of crypto acceleration for a Tinkerboard. I am interesting in using it from user space, too. I believe the manual for the RK3288 is located at Rockchip RK3288 Technical Reference Manual.

The current driver is having problems. It is producing incorrect results. @kernle32dll, this may expalin the performance problems you are seeing (the driver is borked). Also see [Bug] Rockchip crypto driver sometimes produces wrong ciphertext on the linux-crypto mailing list.

Can anyone confirm we cannot use the crypto from userspace? Or even better, can anyone find a userland C/C++ example of using the crypto acceleration? I'm happy to drop into inline assembly, if needed.


After digging trough some ancient forum posts, it seem'd that openssl in particular would (should?) not be not able to access the module, as it resides in a different space (kernel space vs user space).

Ping Andy Polyakov (@dot-asm). He may already know the limitations on the SoC.

cracket commented 1 year ago

Small offtopic question - did anyone succeed benchmarking 3288 crypto with mainline (5.15.64) kernel? I tried both with AF_ALG and cryptodev and both crash kernel/machine