Support the full 16 bytes

bluss commented 9 years ago

Expand AsciiChars to support the full 16 byte capability of the underlying instructions. This leads to the need of using two u64 words to store the data.

Using 16 bytes requires an extra instruction, but performance seems to be unaffected. Please let me know if you see any performance change in your configuration!

Also change to using const fn, since the crate is using several gated features already. The const fn is also safer than the explicit struct literal, because matching non-ascii bytes in Searcher breaks the utf-8 safety of str (hence the unsafe on the Searcher impl).

However, with this thinking around unsafe I also discovered two things about the fallbacks that I haven't addressed:

They are unsafe. You can match on arbitrary bytes, and the searcher will output those as matches, even if they fall between character boundaries. Searcher really is the unsafe trait it says it is, it's just easy to forget -- and it doesn't feel really right!
I experimented automatically implemented fallbacks that didn't use the closures. It worked really well, I didn't use the closures, and I got the same performance. However the moment you remove the use of the closure type parameter something happens -- rust is no longer forced to compile each use of the AsciiCharsWithFallback into a unique type with unique specializations, and then the performance drops again. No amount of inline(always) could fix that. Pretty interesting! It means that closures are necessary for good specialization, but it's not the code in the closures that matters.

bluss commented 9 years ago

Feel free to consider the direction you want to take this crate. I'm working on it because it's interesting and I wanted to learn about pcmpestri. I've got some substring search prototypes in the works, but I think I need to implement a full two-way algorithm (current Rust's substring algorithm) using them.

shepmaster commented 9 years ago

Expand AsciiChars to support the full 16 byte capability of the underlying instructions. This leads to the need of using two u64 words to store the data.

This is something I had been planning on, but was hoping that Rust would gain a semi-native type for this, perhaps from the SIMD package. However, I'm more than happy to see it added now. I think that a lot of the constructor logic is going to fall into the very unstable realm for a while.

Also change to using const fn

Yay! I'd been following const fn stuff, but evidently the wrong issue. I think that this crate is a perfect match for a const fn. My original thought was that the constructor could take a &'static [u8] and build up the 8 (now 16) bytes, verify they are UTF-8, and maybe even build the fallback (I might be reaching on the last point).

things about the fallbacks

One of my reservations about the fallback is that you can easily disconnect it from the packed-byte representation. See me point above about hoping that the const fn could help with it somehow. Maybe a combination of a macro and a const fn...

rust is no longer forced to compile each use of the AsciiCharsWithFallback into a unique type with unique specializations

Ah, you got more useful information out of your investigation than I did! I spent about a week trying to get the automatically generated fallback performance to match the closure. Eventually, I gave up and decided that it was the closure being inlined and specialized which really provided the performance. You are saying that if we could put a dummy type parameter that was unique for each call site, we could use the same code. Perhaps another case for a macro...

Feel free to consider the direction you want to take this crate.

I'm actually super excited to have the substring matching added! The name "jetscii" might not apply as well for something that actually supports UTF-8, but I think we can live with that :sweat_smile:

implement a full two-way algorithm

Would you foresee that living in this crate? I've only skimmed over the algorithm a bit, but it did seem... complex. I'd wonder about it being bigger than the rest of the code combined :smile:

bluss commented 9 years ago

I don't know so much about this huge realm of simd and asm, maybe https://crates.io/crates/llvmint can be used? There are simd types and intrinsics that can replace expicit inline asm.

bluss commented 9 years ago

two way would be complex. It's the algorithm I've just implemented / re-added to rust itself. I guess the main work would be to write kind of the equivalents of memchr and memcmp with pcmpestri, and then use those in the algorithm.

The prototype is relatively simple and I learned a lot from looking at your alignment fixup, it seems like a sound solution.

I think it should start by focusing on simple &[u8] in &[u8] search first -- any restriction to UTF-8 is not really a concern of the algorithm itself.

I'm doubtful if it should live in jetscii, maybe it could. Regardless we need to get cpuid sorted (best in a separate crate, so all users of it link to the same static?) and then the best way to use inline asm / platform specific intrinsics. Ideally it should all have a stable compiling fallback.

shepmaster commented 9 years ago

I think it should start by focusing on simple &[u8] in &[u8] search first -- any restriction to UTF-8 is not really a concern of the algorithm itself.

A good point; I keep thinking in terms of strings and characters, but the instruction itself doesn't really care. Maybe I need to revisit that for the existing code, but I would love to think up a usecase for it too...

That being said, we can always provide a nice little wrapper that takes in &str.

I'm doubtful if it should live in jetscii, maybe it could

Which "it" do you refer to here? Finding &[u8] in &[u8] or the two way search? The former seems a good fit here, the latter probably not.

need to get cpuid sorted (best in a separate crate, so all users of it link to the same static?)

Yeah, that sounds reasonable. I can start a new crate if it makes sense - would you like to be a collaborator? Do we have a fun name suggestion, or just something like cpuid-rs?

stable compiling fallback.

I think this mostly depends on the Pattern API stabilizing.

shepmaster commented 9 years ago

Oh, I just found https://github.com/gz/rust-x86 which has cpuid stuff, but also a bunch more.

bluss commented 9 years ago

Finding &[u8] in &[u8] or the two way search?

The two way algorithm is the thing I'm exploring to do this with, so they are the same thing. :-) Of course, other algorithms could apply tpo. A naive search with pcmpestri is very fast, but still O(n²) on certain input. For a needle of less than 16 bytes, I don't think it matters.

shepmaster commented 9 years ago

I'm down to merge this, once I understand the removal of the if index > len { bit. Thanks!

bluss commented 9 years ago

How does it bench on your configuation? I was thinking that it should be no change

shepmaster commented 9 years ago

I finally had some free time to run the benchmarks and see that they are the same. Thanks again!

shepmaster / jetscii

Support the full 16 bytes #10