Open mcy opened 2 years ago
(I should mention that this used the newest stable rustc and clang as of writing: 1.59 and 13, resp.)
OK, I godbolted this and am working through the assembler code. It's definitely using SSE on -C opt-level=2
and above, though it doesn't look much like the implementation that I would expect; I would have thought it would just count the number of bytes equal to b'-'
, but I can't wrap my head around it at the moment. I'm certainly not seeing any subtraction or comparisons going on there.
Those functions seem different, am I missing something? C++ version returns true when there are at least 2 dashes, while Rust version returns true when there are at least 3 dashes.
Also, when first sv.find_first_of('-')
returns npos (no dashes), +1
overflows and the whole string is searched again, which is interesting, but it should still return the correct result.
Note that count
will fully consume the iterator, while you probably want it to stop when the second b'-'
is found. This should a tiny bit better, though it's still not as fast as it should be.
fn has_more_than_two_dashes(s: &[u8]) -> bool {
sl.iter().filter(|&&c| c == b'-').nth(1).is_some()
}
I think C++'s find_first_of
is specialized to use an optimized memchr
, while the Rust version can't do that because filter
takes a closure that could do anything, not just a plain equality check. This needs either a specialized method that takes an element to search for, as opposed to a closure (see also #84058 and #56345 for some past attempts), or something that acts like a plain equality closure and can be specialized upon.
Edit: not to say this is what you should do, but I find interesting that you can manually inline memchr
in Rust, without using unsafe
code, and slightly beat the C++ version.
There's also memchr crate, which was created to address similar performance issues.
I am not convinced this should be considered a bug; like @panstromek said, if performance matters you can switch in memchr
. And the Rust code is not quite equivalent to the C++ anyway, it should be something like s1[s1.iter().position('-')?..].iter().position('-').is_some()
.
A colleague provided me with this benchmark, which compares the following equivalent C++ and Rust:
C++ performs 20x better in the microbenchmark. We haven't tried very hard to figure out why, but our suspicion is that Rust's implementation of
memchr
is worse than the one provided by the system that ran the benchmark. C++ also bails out early, and Rust does not, but that appears not to matter because the dashes are at the far end of the strings.We're not actually sure what CPU quick-bench ran this with, but I can run it on my (icelake, I think) Xeon later. I think the difference in memchr implementations is worth investigating regardless.