tpircher-zz / pycrc

Free, easy to use Cyclic Redundancy Check (CRC) calculator and source code generator
https://pycrc.org
MIT License
169 stars 36 forks source link

speed improvement using SSE4 crc32 cpu instruction? #15

Open ThomasWaldmann opened 8 years ago

ThomasWaldmann commented 8 years ago

There is special support for crc computation in intel/AMD CPUs since quite some years:

http://www.drdobbs.com/parallel/fast-parallelized-crc-computation-using/229401411

https://en.wikipedia.org/wiki/SSE4#Supporting_CPUs

The drdobbs article says that this yields performance of about 1.17 cycles per 64bits word (for a measurement done with a loop, repeatedly computing over a small amount of data, so I guess one can assume they sit in L1 or L2 cache of cpu).

At 2.4GHz, this could mean up to 16GB/s (or whatever your RAM bandwidth is limiting this value to).

tpircher-zz commented 8 years ago

Hmm, this is architecture specific and only works for one specific polynomial (0x1EDC6F41). I think it's unlikely to implemented in pycrc any time soon.

ThomasWaldmann commented 8 years ago

Pity.

Considering that less-than-5y-old intel/amd cpus are quite common and many people just need some crc (not a specific crc), I can imagine a lot of people could use this.

I ran test/performance.sh and the maximum I got from that was 0.806 GB/s (crc32, table-driven sb4) on a Core i5-4200u.