grpc / grpc-web

gRPC for Web Clients
https://grpc.io
Apache License 2.0
8.56k stars 762 forks source link

base64 encode/decode is really slow #1187

Open yasushi-saito opened 2 years ago

yasushi-saito commented 2 years ago

Background: we have a grpc-web-based app that performs bulk data upload, and it can sustain only about ~15MB/s regardless of the underlying network speed. The reason is that the base64 encoder used by grpc-web (goog.crypt.base64) is quite slow. I did a quick benchmark between goog.crypt.base64.encodeByteArray vs Base64.fromUint8Array (from https://www.npmjs.com/package/js-base64), and the former is about 50x slower than js-base64. E.g., encoding a random 16MB Uint8Array payload takes about ~1000ms with goog.crypt.base64, 19ms with js-base64.

yasushi-saito commented 2 years ago

FYI, here's a node script that I used.

require('google-closure-library');
goog.require('goog.crypt.base64');
const {Base64} = require('js-base64');

// Create a Uint8Array with random contents.
function randomArray(len) {
    const a = new Uint8Array(len);
    for (let i = 0; i < len; i += 1) {
        a[i] = Math.ceil((Math.random()) * 256);
    }
    return a;
}

function runBench(label, len, encode, decode) {
    // Run a encode/decode roundtrip once and verify their behavior.
    const bin = randomArray(len);
    const str = encode(bin) // base64 string
    const bin1 = decode(str); // should be the same as bin
    if (bin1.length != bin.length) {
        throw Error(`wrong length: got ${bin1.length} want ${bin.length}`);
    }
    for (let i = 0; i < bin.length; i += 1) {
        if (bin[i] != bin1[i]) {
            throw Error(`wrong data at ${i}: got ${bin1[i]} want ${bin[i]}`);
        }
    }

    // Run a given function in a loop and measure a mean runtime.
    const run = (cb) => {
        const startTime = new Date();
        let rep = 0;
        for (; ;) {
            for (let i = 0; i < 10; i += 1) {
                cb();
                rep += 1;
            }
            const elapsed = new Date() - startTime;
            if (elapsed > 1000) {
                return elapsed / rep;
            }
        }
    };

    const encodeTime = run(() => encode(bin));
    console.log(`${label} encode: len=${len}: ${encodeTime}ms`);
    const decodeTime = run(() => decode(str));
    console.log(`${label} decode: len=${len}: ${decodeTime}ms`);
}

const lens = [8 << 10, 1 << 20, 16 << 20];
for (i in lens) {
    runBench("jsbase64", lens[i],
             Base64.fromUint8Array,
             Base64.toUint8Array);
    runBench("googcrypt", lens[i],
             goog.crypt.base64.encodeByteArray,
             goog.crypt.base64.decodeStringToUint8Array);
}

result:

jsbase64 encode: len=8192: 0.008025978191148172ms
jsbase64 decode: len=8192: 0.08218390804597701ms
googcrypt encode: len=8192: 0.15714285714285714ms
googcrypt decode: len=8192: 0.27424657534246577ms
jsbase64 encode: len=1048576: 0.997029702970297ms
jsbase64 decode: len=1048576: 10.55ms
googcrypt encode: len=1048576: 57.2ms
googcrypt decode: len=1048576: 35.46666666666667ms
jsbase64 encode: len=16777216: 18.333333333333332ms
jsbase64 decode: len=16777216: 178.3ms
googcrypt encode: len=16777216: 966.8ms
googcrypt decode: len=16777216: 581.6ms
sampajano commented 2 years ago

@yasushi-saito Thanks a lot for discovering the performance bottleneck and doing the benchmark! :)

Although, it's unlikely that we'll adopt a third-party library (i.e. js-base64) for doing encoding/decoding (especially internally).

Maybe you could also consider filing this bug to the closure library, which we (and other dependencies) would benefit from if they can make a performance improvement?

Thanks! :)

yasushi-saito commented 2 years ago

Makes sense. I created an issue https://github.com/google/closure-library/issues/1161

sampajano commented 2 years ago

Thanks for filing the bug! I've subscribed also :)