llogiq / bytecount

Counting occurrences of a given byte or UTF-8 characters in a slice of memory – fast
Apache License 2.0
225 stars 26 forks source link

Fancy new algorithm on stable SIMD and a bunch of other stuff #44

Closed Veedrac closed 6 years ago

Veedrac commented 6 years ago

Sigh. I'm not a fan of Rust's fragmentation around SIMD. Let's see, we have

  1. An AVX2 implementation, using Intel's intrinsics,
  2. An SSE2 implementation, using Intel's intrinsics,
  3. A generic version, using packed_simd,
  4. A fake-integer-SIMD version, using bit magic,
  5. A trivial fallback.

Why? Well,

  1. the AVX2 implementation exists because it's the fastest,
  2. the SSE2 implementation exists because the generic version requires nightly,
  3. the generic version exists because other architectures exist, plus it's strictly better than the SSE2 version, plus it works with no_std,
  4. the fake-integer-SIMD version exists for the same reason the generic one does, though it isn't as good as any of the above, but doesn't require nightly,
  5. the fallback version exists for short vectors and for naive_ operations.

If the generic version supported stable, we could throw out 2. and 4., and I could make 1. a simple extension, like it used to be. It might make sense to just throw some of this out now for maintainability reasons; this clearly isn't a sane way of doing things.

Good luck reviewing this ¬_¬. Did I mention it's all unsafe?

llogiq commented 6 years ago

What's with the appveyor build?

Veedrac commented 6 years ago

@llogiq I #[cfg]'d things incorrectly, plus what @mati865 said. I should see if setting up a cross-compiler allows me to test this locally.

Veedrac commented 6 years ago

Fix is up. Sorry for the wait (^_^;).

llogiq commented 6 years ago

Thanks! I'll prepare a new release shortly.

Veedrac commented 6 years ago

I'll need to update the README since the flags have changed; give me 24h and I'll have a PR. Might also make sense to have a limited beta or somesuch since there's a lot of fresh unsafe code.