Often, the input message does not fit a whole number of SIMD vectors (for 128-bit SIMD, it means the message size in bits is not a multiple of 128). Because of this, we currently often finish by reading a "partial" vector. To do this at high-speed, a small function verifies that we can read beyond the bounds of the input message without risking a segfault or some other dramatic exception (using the page size).
In practice, this works well, but this check as a small overhead. It turns out we can simply start first by reading the partial vector, and only then read a whole number of blocks. This allows us to skip the safety check in some situations because we aren't reading beyond the bounds of the message anymore (because we have at least one message block after this).
This change is about reading first the partial vector. This change will help boost performances, but will however break the stability (meaning that from this change generates hashes will be different from the hash generated by previous versions of the algorithm, for the same inputs)
Todo
Read partial vector first
Remove "range" logic which is not necessary anymore with this approach
Context
Often, the input message does not fit a whole number of SIMD vectors (for 128-bit SIMD, it means the message size in bits is not a multiple of 128). Because of this, we currently often finish by reading a "partial" vector. To do this at high-speed, a small function verifies that we can read beyond the bounds of the input message without risking a segfault or some other dramatic exception (using the page size). In practice, this works well, but this check as a small overhead. It turns out we can simply start first by reading the partial vector, and only then read a whole number of blocks. This allows us to skip the safety check in some situations because we aren't reading beyond the bounds of the message anymore (because we have at least one message block after this). This change is about reading first the partial vector. This change will help boost performances, but will however break the stability (meaning that from this change generates hashes will be different from the hash generated by previous versions of the algorithm, for the same inputs)
Todo