Open AlexGuteniev opened 4 days ago
The previous attempt was #4654 and it ended up being just memcmp
removal; see https://github.com/microsoft/STL/pull/4654#issuecomment-2159504811
:warning: Note to self, check the benchmarks on my machine.
Large needles are expected to be not much worse performance, but there would be a different branch with a somewhat different approach. I can do them in this PR and not subsequent PR, if you prefer that.
This PR isn't too massive, I think it would make sense to add large needle logic here.
This PR isn't too massive, I think it would make sense to add large needle logic here.
Updated. Can raise the bet even more by adding find_end
or adding search_n
or by trying to make 4 and 8 byte elements (although I'm skeptical on larger elements).
/azp run
Different approach for both search and inner comparison (SSE4.2 instead of AVX2). This time the results are better.
For now 1 and 2 bytes element only. The same slightly modified approach can be used for 4 and 8 bytes elements, but need to test if there would be still a performance gain.
In benchmark results 0 is small needle, 1 is large needle.