klauspost / crc32

CRC32 hash with x64 optimizations
BSD 3-Clause "New" or "Revised" License
74 stars 23 forks source link

go 1.5 already includes the SSE42 updateCastagnoli() function #3

Closed wscott closed 8 years ago

wscott commented 9 years ago

I was playing with this repo mainly as a way to teach myself code and including optional assembly versions of some routines. Very nice.

So I extended your benchmarks to test the Castagnoli version as well, and found that it wasn't any faster than the system hash, and sure enough the go 1.5 version includes the SSE 4.2 code to use the new crc32c opcode. The IEEE crc32 is still slow.

It might be a good idea to note that in the README

klauspost commented 9 years ago

I am not sure what you mean. I have adjusted the documentation, since it is carryless multiplication your CPU must support, not only SSE 4.2.

wscott commented 9 years ago

For example I added some tests like this:

func BenchmarkCCrc1KB(b *testing.B) {
    benchmark(b, New(MakeTable(Castagnoli)), 1024)
}

func BenchmarkCStdCrc1KB(b *testing.B) {
    benchmark(b, crc32.New(crc32.MakeTable(Castagnoli)), 1024)
}

And found that for Castagnoli the std library was the same speed. And looking I see it already included that support. https://golang.org/src/hash/crc32/crc32_amd64.s

I assumed that you wrote all of the assembly files and the std library just copied part of them. Now I am guessing they always had the fast Castagnoli version and you extended it to include the SSE code for the IEEE crc as well.

klauspost commented 9 years ago

Yes, I started this as a copy of the standard library.

This is the current "tip" version, which includes my code: https://tip.golang.org/src/hash/crc32/crc32_amd64.s

wscott commented 9 years ago

BTW another optimization that I notice isn't included is that the slicingBy8 optimization can still be used in the Castagnoli case.

klauspost commented 9 years ago

Oh yes, that might be nice for other platforms.

klauspost commented 8 years ago

Added in PR #5