Closed wscott closed 8 years ago
I am not sure what you mean. I have adjusted the documentation, since it is carryless multiplication your CPU must support, not only SSE 4.2.
For example I added some tests like this:
func BenchmarkCCrc1KB(b *testing.B) {
benchmark(b, New(MakeTable(Castagnoli)), 1024)
}
func BenchmarkCStdCrc1KB(b *testing.B) {
benchmark(b, crc32.New(crc32.MakeTable(Castagnoli)), 1024)
}
And found that for Castagnoli the std library was the same speed. And looking I see it already included that support. https://golang.org/src/hash/crc32/crc32_amd64.s
I assumed that you wrote all of the assembly files and the std library just copied part of them. Now I am guessing they always had the fast Castagnoli version and you extended it to include the SSE code for the IEEE crc as well.
Yes, I started this as a copy of the standard library.
This is the current "tip" version, which includes my code: https://tip.golang.org/src/hash/crc32/crc32_amd64.s
BTW another optimization that I notice isn't included is that the slicingBy8 optimization can still be used in the Castagnoli case.
Oh yes, that might be nice for other platforms.
Added in PR #5
I was playing with this repo mainly as a way to teach myself code and including optional assembly versions of some routines. Very nice.
So I extended your benchmarks to test the Castagnoli version as well, and found that it wasn't any faster than the system hash, and sure enough the go 1.5 version includes the SSE 4.2 code to use the new crc32c opcode. The IEEE crc32 is still slow.
It might be a good idea to note that in the README