Closed noloader closed 6 years ago
Thanks! The initial research you've done here really helps with supporting this. Hopefully I will be able to finish this off before 2.3 release #1156
BTW it looks like in the last couple of months patches have been added to GCC for vec_xl_be
and vec_reve
, so this code can eventually become simpler. Though it will be a while yet before GCC 8 is released, much less before we can assume it. :/
@randombit,
I updated the patch and added some comments to VectorLoad
and VectorLoadKey
. VectorLoadKey
is the former VectorLoadAligned
. The comments and the rename are the update. Nothing else changed.
I also tested swapping VectorLoad
and VectorLoadKey
. They are not interchangeable because VectorLoad
performs an endian swap as necessary, while VectorLoadKey
assumes the key is in the correct endianess. Sorry about that.
@randombit,
... it looks like in the last couple of months patches have been added to GCC for vec_xl_be and vec_reve, so this code can eventually become simpler.
You were right about some of the built-in functions. I was able to refactor and clean them up a bit; see the aes-p8.c
proof of concept. The demo is compatible going back to GCC 4.8 on GCC112 from the test farm. GCC 4.8 on GCC112 is what I have been using to test Botan :)
Done! Thanks again for the help jww
Attached and below is a patch for AES using Power8 built-ins. Its another partial patch, and it hijacks the C++ implementation. Others will have to complete it.
This patch only provides the forward transformation or encryption. Botan and Crypto++ both use the "Equivalent Inverse Cipher" (FIPS-197, Section 5.3.5, p.23), and it is not compatible with IBM hardware. Both libraries will need to re-work the decryption key scheduling routines. (It could be a simple as using the encryption key table for decryption. I have not investigated it yet).
The patch looks awful because there are two abstraction layers. The first deals with GCC, xlC and platform endianess. GCC and xlC have different datatypes and built-ins. GCC only has the 64-bit types, while xlC has all the types, including the 8-bit types. For GCC we have to do a fair amount of work for endian conversions.
xlC does not have the endian problems because we can load the buffer as a array of 8-bit elements using either
vec_xl_be
orvec_xl
. When needed with xlC, we can perform one permute usingvec_reve
. GCC lacks that sort of ease of use.The second layer abstracts the higher level operations, like
VectorEncrypt
,VectorDecrypt
,VectorEncryptLast
andVectorDecryptLast
.The abstraction layers were a lot like trying to put lipstick on a pig. There's really no way to make the code look elegant when xlC had one set of primitives and operations, and GCC had another which was merely a subset.
The VMX unit is very sensitive to buffer alignments. For Power7 and VMX buffers must be aligned. Power8 and VMS is a little more tolerant and you can load unaligned buffers using different instructions. The different built-ins are
vec_xl_be
andvec_xl
for xlC; andvec_vsx_ld
for GCC. GCC has to permute after the load on little-endian systems.The code assumes an aligned key buffer because the library controls it. The code does not assume aligned buffers for
in
andout
because the user controls them. The code asserts at runtime in debug builds:IF the key buffers are not aligned, then you can change to the following:I believe AIX and xlC behave differently than Linux and GCC, so be sure to test both systems. I was seeing 2-byte and 4-byte alignments under AIX and xlC.
To reiterate, if you cannot guarantee aligned buffers, then useVectorLoad
instead ofVectorLoadAligned
.Here are the numbers from GCC112 (compile farm), which is a 3.4 GHz IBM POWER System S822. Botan was configured with
./configure.py --cc=gcc --cc-abi="-mcpu=power8 -maltivec"
.If my calculations are correct, Botan is pushing data at about 1 cpb for AES-128 on the machine. Andy Polyakov has OpenSSL running at about 0.7 cpb, so I think it is a very respectable number.
On AIX you can call the following to get the L1 data cache line size. On GCC112 it returns 128!
Glibc prior to 2.24 does not offer
AT_HWCAP
orAT_HWCAP2
to signal Power8 in-core crypto, so its fudged in the patch. The define of interest isPPC_FEATURE2_VEC_CRYPTO
on Linux. I also don't know how to query it on AIX.Here's the ZIP of the diff above: aes-p8.diff.zip.