llogiq / bytecount

Counting occurrences of a given byte or UTF-8 characters in a slice of memory – fast
Apache License 2.0
214 stars 25 forks source link

Add aarch64 SIMD specialization #82

Closed llogiq closed 11 months ago

llogiq commented 1 year ago

I'm still without a good ARM CPU (but will hopefully get one, soon), but here's what I think should likely be the aarch64 intrinsics version of bytecount.

@Veedrac if you have a bit of time, I'd appreciate a review.

llogiq commented 1 year ago

There are still failing tests on aarch64. Notably the overflow_many tests fail for both count and num_chars. I'll look into that.

Veedrac commented 1 year ago

Am I good to wait for the tests to be fixed before reviewing, or would you prefer a review sooner?

llogiq commented 1 year ago

Yeah, I need to find a few hours. I think I know what's wrong, just need to fix things. I'll ping you.

llogiq commented 1 year ago

@Veedrac if all goes well, CI should be green soon. I've checked that perf matches the packed_simd variant on a M2 macbook and my mobile phone (using the bootstrap trick).

llogiq commented 11 months ago

@Veedrac CI is most certainly green. I'm tempted to just bump the version, push and publish it.

llogiq commented 11 months ago

I'm going to merge this now; it's fared well in all my tests, and if there's anything wrong, we can fix it in a followup PR.