Open drskalman opened 3 years ago
GF-Complete has clmul implentation. Investigated that. Have run some speed test and check what kind of region clmul accept. Submitted the result of the speed test:
1 billion multiplication bench mark:
GF-Complete Carry free: 0m15.012s
GF-Complete Split-table: 0m25.512s
Our GF((2^8)^2) implementation: 0m10.711s
I read "S. Gueron and M. E. Kounavis, Intel (r) carry-less multiplication instruction and its usage for computing the gcm mode (rev. 2.02), April 2014." and I tried to get some mathematical insight into the reduction algorithm to no avail. It looks like a mathematically sound witch craft but it is working so should be able to implement it with no problem.
Implemented multiplication with reduction using clmul. It was twice slower in rust than c. Investigated, resolved using features sse2 sse4.1.
generated needed log and exp table for GF(2^16) to perform division.
The first step is to add the module which implements the reduction by recyling it from current galois_16.