powturbo / Turbo-Base64

Turbo Base64 - Fastest Base64 SIMD:SSE/AVX2/AVX512/Neon/Altivec - Faster than memcpy!
GNU General Public License v3.0
264 stars 40 forks source link

The library is not fast #19

Closed alexey-milovidov closed 1 year ago

alexey-milovidov commented 1 year ago

https://github.com/ClickHouse/ClickHouse/issues/41957

powturbo commented 1 year ago

It is not only fast be is the fastest. In your benchmark you are comparing extremely short strings of length 4. This is not a typical length for base64 strings and in this case turbo-base is using the scalar function against avx512 base64 functions. Additionaly you are using clikckhouse queries with all of the overhead in a volatile environment. A benchmark must compare a very wide range of string lengths. In your case, it is better to think about droping completely the storage in a base64 format. Convert the base64 strings at insert and store the strings as raw data.

alexey-milovidov commented 1 year ago

We have found a bug in some code paths in this library and removed dynamic CPU dispatching: https://github.com/ClickHouse/ClickHouse/pull/31797/files

powturbo commented 1 year ago

The new avx512 finally benchmarked. See Readme.

powturbo commented 1 year ago

New AVX512 benchmark on AMD 7840HS 3.8-5.1GHz (ideapad pro 5) Turbo-Base64 decodes more than 3 times faster than aklomp/base64

 13336 133.36%    120035.22  97068.10  8:tb64v512vbmi  (turbo-base64)
 13336 133.36%     89264.00  31715.94 16:b64avx512       (aklomp/base64)
 10000 100.00%     77716.64  78157.76 10:memcpy