intel / isa-l_crypto

Other
275 stars 80 forks source link

Use embedded broadcast to replicate constants #123

Closed Shark64 closed 8 months ago

Shark64 commented 1 year ago

Hi. I've tried to make a minimal version of my patch for using the AVX512 embedded broadcast feature. I've kept only the embedded broadcast, using SIMD instructions to update the data pointers instead of the unrolled scalar code and loop alignment to maximize uop-cache utilization. On SM3_MB i've also switched 2 macros to use VPTERNLOG instead of a sequence of separate logical instructions. passes `make tests' on my PC (Linux and Rocketlake CPU). Sorry for the new pull request, but i haven't found a way to edit the other pull-request under git to only apply parts of my changes.

pablodelara commented 8 months ago

Closing this PR, as there is no further activity. @Shark64, feel free to reopen it if you want to keep going.

Shark64 commented 7 months ago

Ok, should i make a new pull request with only the broadcast for constant or i can keep the minor optimizations like using a base register instead of RIP addressing to make instructions shorter? Thanks!

pablodelara commented 7 months ago

I suggest different PRs or a single PR but with multiple commits (especially if there are dependencies)