Open sinui0 opened 1 year ago
you can take the implementation from emp-rust https://github.com/pado-labs/emp-rust/blob/main/emp-tool/src/block.rs
@xiangxiecrypto thanks will definitely borrow from there.
I quickly hacked together something locally for x86 and found only an 8% improvement for garbling AES. So I suspect my napkin math/profiling above was incorrect. Nonetheless this feature should be pursued.
Right now the backing type of
Block
is[u8; 16]
as it was simple to start with. However, core arrays can not always take advantage of auto-vectorization/SIMD for important operations.For example, on my machine, it currently takes ~3ns to compute
BitXor
between two blocks. If usingstd::arch::x86_64::__mm128i
this would be <1ns. We can of course just convert the[u8; 16]
to/from__mm128i
for the operation, but the overhead of doing so erases any performance gains.Block
would ideally use anu8x16
SIMD backing type depending on target architecture. Unfortunatelystd::simd
is not stable yet so we will have to provide implementations for the various architectures.To briefly motivate doing this, after some quick profiling and napkin math it seems that garbling a circuit is currently bottlenecked on
BitXor
, not the CCR hashing!For example, the AES circuit has 6500 AND and 30,163 XOR gates. Garbling this circuit currently takes ~350 microseconds on my machine. Garbling an AND gate requires up to 9 XOR ops. In total that is 88,663 XOR ops to garble AES, which adds up to 266 microseconds or roughly 75% of CPU time.