rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.84k stars 12.51k forks source link

Poor codegen for string search (i.e., `memchr`) with iterators #94573

Open mcy opened 2 years ago

mcy commented 2 years ago

A colleague provided me with this benchmark, which compares the following equivalent C++ and Rust:

bool HasMoreThanTwoDashes(std::string_view sv) {
    return sv.find_first_of('-', sv.find_first_of('-')+1) != std::string_view::npos;
}
fn has_more_than_two_dashes(s: &[u8]) -> bool {
  sl.iter().filter(|&&c| c == b'-').count() > 2
}

C++ performs 20x better in the microbenchmark. We haven't tried very hard to figure out why, but our suspicion is that Rust's implementation of memchr is worse than the one provided by the system that ran the benchmark. C++ also bails out early, and Rust does not, but that appears not to matter because the dashes are at the far end of the strings.

We're not actually sure what CPU quick-bench ran this with, but I can run it on my (icelake, I think) Xeon later. I think the difference in memchr implementations is worth investigating regardless.

mcy commented 2 years ago

(I should mention that this used the newest stable rustc and clang as of writing: 1.59 and 13, resp.)

gskt17 commented 2 years ago

OK, I godbolted this and am working through the assembler code. It's definitely using SSE on -C opt-level=2 and above, though it doesn't look much like the implementation that I would expect; I would have thought it would just count the number of bytes equal to b'-', but I can't wrap my head around it at the moment. I'm certainly not seeing any subtraction or comparisons going on there.

panstromek commented 2 years ago

Those functions seem different, am I missing something? C++ version returns true when there are at least 2 dashes, while Rust version returns true when there are at least 3 dashes.

panstromek commented 2 years ago

Also, when first sv.find_first_of('-') returns npos (no dashes), +1 overflows and the whole string is searched again, which is interesting, but it should still return the correct result.

SkiFire13 commented 2 years ago

Note that count will fully consume the iterator, while you probably want it to stop when the second b'-' is found. This should a tiny bit better, though it's still not as fast as it should be.

fn has_more_than_two_dashes(s: &[u8]) -> bool {
  sl.iter().filter(|&&c| c == b'-').nth(1).is_some()
}

I think C++'s find_first_of is specialized to use an optimized memchr, while the Rust version can't do that because filter takes a closure that could do anything, not just a plain equality check. This needs either a specialized method that takes an element to search for, as opposed to a closure (see also #84058 and #56345 for some past attempts), or something that acts like a plain equality closure and can be specialized upon.

Edit: not to say this is what you should do, but I find interesting that you can manually inline memchr in Rust, without using unsafe code, and slightly beat the C++ version.

panstromek commented 2 years ago

There's also memchr crate, which was created to address similar performance issues.

jyn514 commented 1 year ago

I am not convinced this should be considered a bug; like @panstromek said, if performance matters you can switch in memchr. And the Rust code is not quite equivalent to the C++ anyway, it should be something like s1[s1.iter().position('-')?..].iter().position('-').is_some().