Closed Validark closed 1 year ago
This happens to help the compiler directly emit vperm2i128 ymm1, ymm1, ymm3, 0x21.
vperm2i128 ymm1, ymm1, ymm3, 0x21
Also made the code a bit more resilient to change by basing the conversion on the vector bit size divided by 4 rather than if (chunk_len == 32).
if (chunk_len == 32)
This looks good to me. I'm going to merge but @sharpobject let us know if you have any objections as this overwrites some of your changes. Thanks! :+1:
Seems great!
This happens to help the compiler directly emit
vperm2i128 ymm1, ymm1, ymm3, 0x21
.Also made the code a bit more resilient to change by basing the conversion on the vector bit size divided by 4 rather than
if (chunk_len == 32)
.