Closed aqrit closed 6 years ago
Small speed gain by despacing larger chunks.
128-bit chunks requires more operations than 64-bit chunks. However, these extra operations don't increase the length of the critical path. The critical path is actually shortened because less extract operations are needed.
extract
For despacing dwords, vpermd method is likely not optimal.
Small speed gain by despacing larger chunks.
128-bit chunks requires more operations than 64-bit chunks. However, these extra operations don't increase the length of the critical path. The critical path is actually shortened because less
extract
operations are needed.