Closed alexey-milovidov closed 4 years ago
I've optimzed all the Turbo Base64 functions also for short inputs. It is also possible to disable checking for more faster decoding by compiling Turbo Base64 with
make NCHECK=1
Now it is also possible to do a direct call to the archtitecture dependent functions instead of tb64enc+tb64dec. Use _tb64e+_tb64d (after calling initialization tb64ini at the program start) instead of tb64enc+tb64dec. This saves a check + a non-inlined function call.
There are now optimized functions for short strings. Actually only for avx2. Just call the new tb64ini with the parameter isshort = 1 at the start of the program. See turbob64.h
This saves a check + a non-inlined function call.
Perfect! I will try right now...
We run all our tests with UBSan and it said that better to replace unaligned stores with memcpy:
Unaligned access is used only for 32-bits integers using the ctou32 macro. You can see in "conf.h" all the recent cpu's intel/amd, arm 32/64 bits, powerpc,... are supporting unaligned 32-bits access. If it's necessary, I can replace this macro with memcpy using a preprocessor switch in "turbob64c.c" and "turbob64d.c"
Yes, it's 100% safe from the CPU standpoint but it isn't according to the C or C++ standard.
Replacing with memcpy
would not impact performance in any kind.
I've changed the unaligned access to memcpy and made the decoding 5% more faster. I'll optimize late SSE, AVX and ARM for short strings.
Thank you! Let's see what our CI will show...
You must always include turbob64sse in your builds (cmake files), not only for amd64. see turbobase64 makefile Please, upload the latest version. More faster for very short strings.
Ok. BTW, performance test is finished to run and we see significant performance improvement!
(the queires with base64 are near the top)
I'll close this issue as it is completely resolved. And I would like to especially thank you for your help!
I will try to finish the integration of Turbo-Base64 to our product in the nearest days...
There is one minor issue remains to integrate this library: https://github.com/ClickHouse/ClickHouse/pull/8444#issuecomment-573323386 If you are interested you can finish and open another PR.
Hi, I've added the checking for the short strings decoding functions. Due to other overheads in ClickHouse, you can't see the full speed advantage of turbobase64, but in my short strings tests it is 3 times faster in encoding and 3.5-4 times faster in decoding (with checking) than your current base64. see short strings benchmark I don't know the origin of the base64 strings in ClickHouse, but if they are coming from an external source, a database should normally do a one time check at insertion. If you're self generating the base64 strings, an additional checking is meaningless. Please reopen the PR, I'm very interested in this subject.
Turbobase64 now extended to do full checking per default in the short string functions. Functional tests with full checking are successful.
I have updated the library and also added -DB64CHECK
but it didn't help.
You must update the library with latest changes 3 hours ago as in my last comment. The short strings functions are now doing the check per default. I've tested this and it's working.
Now it looks all right, thank you!
Congratulations, it's merged!
PS. You can add a reference, tweet, whatever you want!
Great, thank you. Your team might have a look also at TurboPFor. There are not only the best/fastest integer compression functions, but also interesting lossless and lossy floating-point compression in fp.c. See also the error-controlled lossy convertion FP functions in bitutil.c You can build or download icapp to test all functions with your data.
Yes, we have high demand for efficient codecs. We will try to integrate your libraries and I appreciate if you will help!
(It will be done not by me, this is the task for intern students to try. Will see how it will go...)
BTW, it's possible for libraries with compatible license (not GPL).
Nice to see you're evaluating the integration of other components. I can probably make a separate non-GPL package with the functions you're interested in.
https://github.com/ClickHouse/ClickHouse/issues/8397#issuecomment-568909002
The library behaves worse than https://github.com/aklomp/base64 on strings of average length 77 bytes:
You can download the test data here: https://clickhouse.yandex/docs/en/getting_started/example_datasets/metrica/