oven-sh / bun

Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one
https://bun.sh
Other
74.15k stars 2.77k forks source link

Use/implement a faster base64 encoder/decoder #10269

Open joadnacer opened 6 months ago

joadnacer commented 6 months ago

What is the problem this feature would solve?

There exist possibly faster base64 libraries than aklomp that could be used

Benchmark results of various base64 libraries (on x86_64 with AVX512):

Time to encode/decode 1000KB of unencoded data:
============
time=    336.38us test=std-encode
time=     44.62us test=fastb64-encode
time=    281.79us test=aklomp-encode
time=     43.94us test=tb64-encode
time=    352.03us test=std-decode-validating
time=     87.64us test=fastb64-decode-validating
time=     81.55us test=fastb64-decode-fast
time=   5522.64us test=aklomp-decode
time=     58.69us test=tb64-decode

With:

std = zig std (I improved performance significantly in https://github.com/ziglang/zig/pull/17502)
fastb64 = my zig vector api lib - https://github.com/joadnacer/fastb64z
aklomp = https://github.com/aklomp/base64 (currently used by bun)
tb64 = powturbo base64 - https://github.com/powturbo/Turbo-Base64

aklomp's decoding speed seems suspiciously slow here, unsure why.

Steps to reproduce

git clone https://github.com/joadnacer/fastb64z.git
cd fastb64z
git checkout full-bench
git clone https://github.com/aklomp/base64 && git clone https://github.com/powturbo/Turbo-Base64
cd base64 && cmake . && make && cd ../Turbo-Base64 && make && cd ..
zig build-exe src/benchmarks.zig -O ReleaseFast $PWD/base64/libbase64.a $PWD/Turbo-Base64/libtb64.a
./benchmarks

Run with Zig 0.12.0-dev.3522+b88ae8dbd

What is the feature you are proposing to solve the problem?

Best option would likely be to implement vector-based base64 within bun to reduce external dependencies - happy to do this if interested. Likely can improve fastb64 decoding performance to match Turbo-Base64's.

Wrapping Turbo-Base64 is also a good option.

What alternatives have you considered?

No response

lemire commented 6 months ago

Best option would likely be to implement vector-based base64 within bun to reduce external dependencies

The simdutf library, which is already present in Bun, has full support for WHATWG forgiving base64 decoding, as well as accelerated base64 encoding functions. In Node.js, the base64 encoding and decoding is currently done with simdutf.