Closed alexey-milovidov closed 1 year ago
It is not only fast be is the fastest. In your benchmark you are comparing extremely short strings of length 4. This is not a typical length for base64 strings and in this case turbo-base is using the scalar function against avx512 base64 functions. Additionaly you are using clikckhouse queries with all of the overhead in a volatile environment. A benchmark must compare a very wide range of string lengths. In your case, it is better to think about droping completely the storage in a base64 format. Convert the base64 strings at insert and store the strings as raw data.
We have found a bug in some code paths in this library and removed dynamic CPU dispatching: https://github.com/ClickHouse/ClickHouse/pull/31797/files
New AVX512 benchmark on AMD 7840HS 3.8-5.1GHz (ideapad pro 5) Turbo-Base64 decodes more than 3 times faster than aklomp/base64
13336 133.36% 120035.22 97068.10 8:tb64v512vbmi (turbo-base64)
13336 133.36% 89264.00 31715.94 16:b64avx512 (aklomp/base64)
10000 100.00% 77716.64 78157.76 10:memcpy
https://github.com/ClickHouse/ClickHouse/issues/41957