intel / isa-l_crypto

Other
267 stars 80 forks source link

Use VPTERNLOG for 3-operands boolean functions. #148

Closed Shark64 closed 3 months ago

Shark64 commented 3 months ago

Here are the changes for SM3. I've switched the boolean macros to vpternlog and for the 3-way xor in the body. Two more minor changes ;) : there's no need to "jump the jmp" at the end of the main loop, just loop back if count !=0. Also a couple of vprold reg1, reg1, IMM immediatly followed by vmovups reg2, reg1 can be encoded simply as vprold reg2, reg1, IMM. On my PC this made sm3_mb_vs_ossl_perf go from ~4.4GB/s to ~5.1GB/s :)

pablodelara commented 3 months ago

Here are the changes for SM3. I've switched the boolean macros to vpternlog and for the 3-way xor in the body. Two more minor changes ;) : there's no need to "jump the jmp" at the end of the main loop, just loop back if count !=0. Also a couple of vprold reg1, reg1, IMM immediatly followed by vmovups reg2, reg1 can be encoded simply as vprold reg2, reg1, IMM. On my PC this made sm3_mb_vs_ossl_perf go from ~4.4GB/s to ~5.1GB/s :)

Which CPU are you using? That throughput is pretty high! :)

Shark64 commented 3 months ago

Which CPU are you using? That throughput is pretty high! :)

Rocketlake, an i7-11700k, perhaps it's the low latency DDR4 that helps more than the CPU core itself

pablodelara commented 3 months ago

Which CPU are you using? That throughput is pretty high! :)

Rocketlake, an i7-11700k, perhaps it's the low latency DDR4 that helps more than the CPU core itself

No, these tests use warm data. You must be using turbo boost, so your CPU frequency goes to 5GHz.

Shark64 commented 3 months ago

No, these tests use warm data. You must be using turbo boost, so your CPU frequency goes to 5GHz.

Yeah you're right, i hadn't checked turbostat frequency but now i noticed the single core goes up to 5.1GHz for a brief time. So it's the CPU after all :)

pablodelara commented 3 months ago

Code is now merged, thanks for the work @Shark64!