protobufjs / protobuf.js

Protocol Buffers for JavaScript & TypeScript.
Other
9.87k stars 1.41k forks source link

Use TextDecoder when available #811

Open thomas-jeepe opened 7 years ago

thomas-jeepe commented 7 years ago

TextDecoder is a newer browser feature and supported in a few browsers. From my testing in the browser (reading 100,000 different messages with a few text fields) TextDecoder seems to be 700% faster than the current implementation. Trying it out locally, I edited this method: https://github.com/dcodeIO/protobuf.js/blob/master/src/reader.js#L320

/**
 * Reads a string preceeded by its byte length as a varint.
 * @returns {string} Value read
 */
Reader.prototype.string = function read_string() {
    var bytes = this.bytes();
    return utf8.read(bytes, 0, bytes.length);
};

And replaced it with this

let decoder

if (global.window && global.window.TextDecoder) {
    decoder = new TextDecoder('utf-8')
}

/**
 * Reads a string preceeded by its byte length as a varint.
 * @returns {string} Value read
 */
Reader.prototype.string = function read_string() {
    var bytes = this.bytes();
    if (decoder) {
        return decoder.decode(bytes)
    }
    return utf8.read(bytes, 0, bytes.length);
};

This worked well in the browser and in a node process, with higher performance than the regular implementation. This is an easy check and doesn't have much overhead.

You could also use TextEncoder when encoding text, however I haven't tried it out myself.

dcodeIO commented 7 years ago

The last time I checked, the TextDecoder API was still quite slow. Seems they have optimized it, that's good to know!

kuddai commented 2 years ago

At work we see noticeable performance issues while parsing strings due to lack of TextDecoder support. Would you mind if I open PR on this issue?

kuddai commented 2 years ago

More benchmarking is needed. It seems that text decoder is slower than js implementation for smaller strings, and there is a certain string length threshold after which it becomes faster.

kuddai commented 2 years ago

I have compared performance for different ascii string length on Chromium Version 91.0.4472.101 on Ubuntu using this benchmark.

The plot for operations per second (the more the better) for different string lengths ascii-decoding-performance

It seems on Chromium based browser TextDecoder is a clear winner

dcodeIO commented 2 years ago

The last time I checked the break even point was 192 chars (but that was UTF-16) in V8. Also, TextEncoder/Decoder performance between JS engines varies dramatically. Iirc, V8 was the slowest. There is also the concern that TextEncoder/Decoder either replace or throw on certain JS strings, which would make this a breaking change.

kuddai commented 2 years ago

Do you have any benchmark at hand? Because I plan to propagate similar change in flatbuffer, and decoding issues are serious concern to me.

V8 has been optimized for ascii only strings https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/platform/wtf/text/text_codec_utf8.cc;l=326;drc=de68be3f18ba99cc01d75903e167ca09bade253c

I haven't found relevant spec parts about type of errors thrown in deconding, but it can definitely throw judging by blink implementation. What do you think about fallback on current implementation in case of TextDecoder error?

jimmywarting commented 2 years ago

NodeJS have had a mem leak bug that made it much slower than browsers built in. but it's fixed now in latest node versions