google / flatbuffers

FlatBuffers: Memory Efficient Serialization Library
https://flatbuffers.dev/
Apache License 2.0
23.3k stars 3.25k forks source link

Javascript Performance Question #4012

Closed sampaioletti closed 8 years ago

sampaioletti commented 8 years ago

Hello,

We have been using FB for a while now in some C++ and Go projects and have had spectacular performance, we now have a project that needs to communicate with those projects from Javascript in the browser. We have been doing some performance testing comparing JSON.parse with FB (totally non-scientific) and are seeing a bottleneck accessing values (using Chrome browser) whereas 'parsing' works as expected. Just wanted to see if anyone had any thoughts. We appreciate the help.

Given an .fbs of

table SendValues{
    values:[SensorValue];
    res:bool;
}
table SensorValue{
    sensorId:string;
    value:int;
}

we build 10000 entities in FB

        var builder = new flatbuffers.Builder(1);
        var values=[];
        for(var i=0;i<10000;i++){
            values[i]=createSensorValue(builder, 'Sensor001',getRandomIntInclusive(3,10));
        }
        values=schema.SendValues.createValuesVector(builder,values);
        schema.SendValues.startSendValues(builder);
        schema.SendValues.addValues(builder,values);
        var send=schema.SendValues.endSendValues(builder);
        builder.finish(send)
        buf=builder.dataBuffer();

function createSensorValue(builder, name, value) {
    name = builder.createString(name);
    schema.SensorValue.startSensorValue(builder);
    schema.SensorValue.addSensorId(builder, name);
    schema.SensorValue.addValue(builder, value);
    return schema.SensorValue.endSensorValue(builder);
}

and also build 10000 entities in a JSON string

        json="[";
        for(var i=0;i<10000;i++){

            json+="{\"name\":\"sensor001\",\"value\":"+(getRandomIntInclusive(3,10))"}";
            if(i<(9999)){
                json+=","
            }
        }
        json+="]";

getRandomIntInclusive is used to try and keep the v8 optimizer guessing;

afterwhich we alternate between the two building 10000 entities and parsing them (skipping the first two runs to account for JIT compilation)

For JSON

    var values = JSON.parse(json);
    start = performance.now();
    for (var i = 0; i < values.length; i++) {

        var xValue = values[i]['name'];
        var yValue = values[i]['value'];

        point2[i] = { x: xValue, y: yValue }
    }
    end = performance.now();
    jsCount++;
    if (jsCount > 2) {
        jsTotal += end - start;

        jsAve = jsTotal / (jsCount - 2)
        console.log('jsAve:' + jsAve + 'ms' + ' this:' + (end - start) + 'ms');
    }

For FB

    var call = schema.SendValues.getRootAsSendValues(buf);
    start = performance.now();
    for (var i = 0; i < call.valuesLength(); i++) {

        var xValue = call.values(i).sensorId();
        var yValue = call.values(i).value();

        point1[i] = { x: xValue, y: yValue }
    }
    end = performance.now();
    fbCount++;
    if (fbCount > 2) {
        fbTotal += end - start;
        fbAve = fbTotal / (fbCount - 2);
        console.log('fbAve:' + fbAve + 'ms' + ' this:' + (end - start) + 'ms');
    }

and get a console output like

main.js:105 fbAve:12.045000000000982ms this:12.045000000000982ms
main.js:81 jsAve:0.18500000000040018ms this:0.18500000000040018ms
main.js:105 fbAve:19.1550000000002ms this:26.264999999999418ms
main.js:81 jsAve:1.5425000000000182ms this:2.899999999999636ms
main.js:105 fbAve:16.20833333333303ms this:10.31499999999869ms
main.js:81 jsAve:2.504999999999503ms this:4.429999999998472ms
main.js:105 fbAve:14.955000000000155ms this:11.195000000001528ms
main.js:81 jsAve:1.9237499999996999ms this:0.18000000000029104ms
main.js:105 fbAve:14.362000000000444ms this:11.9900000000016ms
main.js:81 jsAve:1.6019999999998618ms this:0.3150000000005093ms
main.js:105 fbAve:13.636666666667375ms this:10.010000000002037ms
main.js:81 jsAve:1.4308333333339458ms this:0.5750000000043656ms

if we run the time around the 'parsing' step i.e.

//FB
var call = schema.SendValues.getRootAsSendValues(buf);

//JSON
var values = JSON.parse(json);

FB is orders of magnitude faster (as one would expect), but you loose all you gained when you have to access the values as in the above code. Does this seem like a fair comparison? After testing the 'parsing' step and getting the results we were expecting we were a little shocked by the spread (ran it at different counts with same basic result).

Sorry for the ugly code was just supposed to be a quick test(my JS is weak anyway). Appreciate any thoughts, including ones that start out with no no dummy (:

quick plunkr to illustrate http://embed.plnkr.co/dDiWvZ/

ghost commented 8 years ago

Yes, FlatBuffers skips the parsing step, but is therefore slower than accessing native objects. I don't think it is fair to compare against native objects without including the time to parse/create them, no, and also not realistic, since you usually can't skip that step.

The only time when native objects are truly faster is when you have to access the data many times. In that case keeping things in FlatBuffers will actually be slower overal. If that is part of the performance critical part of your code, then you should copy data out of FlatBuffers.

sampaioletti commented 8 years ago

Thanks @gwvo I didn't want to cloud the post with to much more info (was already too long) but at first I ran the test including the 'parsing' step and even at 10 items the test showed fb slower overall

fbAve:0.9600000000000364ms this:0.9600000000000364ms
jsAve:0.035000000000763976ms this:0.035000000000763976ms

And was slower at different levels of items in this scenario (100, 10000, etc)

I've updated the plunkr to reflect that.

I was merely trying to isolate out that the problem wasn't in the 'parsing' step.

I would only like to bring it up as we love the library and wanted to see if people far better with js than us had any thoughts on optimizing (or pointed out our tests were bogus). We will look at it and have done some CPU profiling and if we can reduce the number of function calls (maybe some arrays of offsets so we don't have to call functions to get offsets every time) it might help. But not trying to be critical just trying to open a discussion. The reduction in overhead on our server for the serialization step makes the reduced performance on the client more than worth it (since they process for free haha), and I guess thats a core point of the library.

I'll close this out so you don't have to think about it since you are aware that the access is less performant than native and its not really an issue, and if we come up with any performance increases in our work we'll shoot you a pull.

Thanks again!

ghost commented 8 years ago

It is possible that the JS implementation is particularly slow. I designed FlatBuffers originally for C++, and it may not fit JS quite as well (i.e. all the indirections confusing the JIT). Maybe the JS implementation can be optimized, or maybe a different API (that actually unpacks into JS objects would be better for this language.

I believe @evanw (who implemented the JS portion) at some point experimented with an object based API, though I am not sure where that went. @evanw: can you point us to that? Maybe someone would like to continue it.

sampaioletti commented 8 years ago

Yeah i looked at the the Object API posts you mention, but when I went to @evanw s github all I could find was the normal flatbuffers.js file. If we can get ahold of his api I'd be happy to take a look at it and run the same tests to compare the differences, and look at doing a fork with the alternate api so we can see how it goes (since as he mentioned they are using a different library now).

Thanks

sampaioletti commented 8 years ago

@gwvo had a thought while looking at some other stuff we were working on and based on your c++ comment. I might try next week if I have some time to run your c++ code through emscripten and see how efficient the generated js is given this scenario. I did a project a while back where we had a small utility library that we converted and the performance really surprised us. API was ugly from a js perspective, but it worked quite well. I don't think that that is a good long term strategy but if it proves more efficient it might give us a starting point for optimization in the existing code for 'free'. Now that I say that we could even try something similar given the golang generated code and gopherjs project. Both might just provide good test data as a starting point, Just a thought.

sampaioletti commented 8 years ago

Curiosity got the best of me so I did a quick project with gopherJS (added it to plunkr), it wasn't faster in any case. However, an interesting thing happened I was having trouble with the gopherJS returning a string since the GO version returns a []byte...so I decided to just do numbers and eliminate looking up the SensorId string value, Instant difference in the 'official' implementation. Now for very small collections (say 10) it is barely slower and on some iterations actually is faster than the json (it is a little slow on first run as you would expect, but optimizes quickly). I think at that level I still have some problems with the way i'm running the test and that probably throws in some GC that makes the results inconclusive, but its night and day vs running with a string. Once you get over 100 items in the collection it becomes faster than the json version, go with 1000-100000 and fb its substantially faster and stays that way (meaning combined time, access is still slower but the tradeoff works now as you would expect). (updated plunkr with this)

So I still want to run the emscrpiten if I get time next week, but this is excellent news for the application I'm working on as it consists of mostly sending thousands of numeric values for realtime graphing purposes. We will just avoid Strings as much as possible or break them out on first pass into objects.

Another test we could run would be looking up multiple numeric values to make sure it scales...but that is another day.

So I would say from a JS optimization path we should look at how we are handling strings, the cost might be in the UTF8/16 decision tree or elsewhere..but I think it gives us a starting point. I'll continue to look into it, but I'm pretty pleased with the results. Again just my 2 cents.

Sorry to continue to post to a closed issue..but at them moment I think its a safe place to house my ramblings.

ghost commented 8 years ago

Thanks, that's some useful testing! Yes, indeed, looks like the manual UTF-8 decoding is slow.. I wonder how we can speed that up.

sampaioletti commented 8 years ago

Not my wheelhouse, but I'll run some tests..quick google gives a couple ideas..perhaps http://ecmanaut.blogspot.com/2006/07/encoding-decoding-utf8-in-javascript.html?m=1 is worth a try. Glad to be of help. We try and give back if we are using a project. Between the performance and header only c++ this project fits us perfectly (tons of real time data, cross platform and embedded).

ghost commented 8 years ago

Hmm, using built-in functions would be great, bit worried that this is a two step conversion over octets, that might end up generating more garbage.. but who knows it is still faster than what we have. Also would need to check whether that works on Node.js.

Good to hear FlatBuffers works so well for you :)

gagangoku commented 4 years ago

I second what @sampaioletti is seeing. This is the small loadtest example i've created.

export class FlatbufferDemo extends React.Component {
    constructor(props) {
        super(props);
        this.state = {};
    }

    async componentDidMount() {
        cnsole.info('flatbuffers: ', flatbuffers);
        const serialized = this.createMessage(1, 'Hello my name is Gagan');
        const { index, text } = this.parseMessage(serialized);
        cnsole.info('parsed: ', { index, text });

        const N = 10000;
        this.loadTestFb(N);
        this.loadTestJson(N);
    }

    loadTestFb = (N) => {
        let array;
        {
            const startTime = new Date().getTime();
            array = xrange(0, N).toArray().map(x => this.createMessage(x, this.getStr()));
            cnsole.info('Flatbuffer: Time taken in serializing: ', new Date().getTime() - startTime);
        }

        {
            const startTime = new Date().getTime();
            const parsed = array.map(a => this.parseMessage(a));
            cnsole.info('Flatbuffer: Time taken in parsing: ', new Date().getTime() - startTime);
            cnsole.info(array.length, array[10]);
            cnsole.info(parsed.length, parsed[10]);
        }
    };
    loadTestJson = (N) => {
        let array;
        {
            const startTime = new Date().getTime();
            array = xrange(0, N).toArray().map(x => JSON.stringify({ index: x, str: this.getStr() }));
            cnsole.info('Json: Time taken in serializing: ', new Date().getTime() - startTime);
        }

        {
            const startTime = new Date().getTime();
            const parsed = array.map(a => JSON.parse(a));
            cnsole.info('Json: Time taken in parsing: ', new Date().getTime() - startTime);
            cnsole.info(array.length, array[10]);
            cnsole.info(parsed.length, parsed[10]);
        }
    };

    getStr = () => {
        return 'Hello ' + Math.ceil(1000 * Math.random()) + ' world ' + Math.random();
    };
    createMessage = (index, str) => {
        const builder = new flatbuffers.Builder(100);
        const s = builder.createString(str);

        helo.Message.startMessage(builder);
        helo.Message.addIndex(builder, new flatbuffers.Long(index, index));
        helo.Message.addText(builder, s);
        const orc = helo.Message.endMessage(builder);
        builder.finish(orc);

        return builder.dataBuffer();
    };
    parseMessage = (serialized) => {
        const parsed = helo.Message.getRootAsMessage(serialized);
        const index = parsed.index().high;
        const text = parsed.text();
        return { index, text };
    };

    render() {
        return (
            <View>
            </View>
        );
    }
}

Run it for 10,000 iterations and you get this: Flatbuffer: Time taken in serializing: 50 Flatbuffer: Time taken in parsing: 18 Json: Time taken in serializing: 18 Json: Time taken in parsing: 7

Maybe JSON library is too damn fast already !

dbaileychess commented 4 years ago

You example json is just two fields { index, str } and is not characteristic of a typically json with nested objects and more complex parsing. So I would advise to make the data format a bit more complex to really test the difference. Also, you use a long for the index, which I don't find necessary for 10,000 iterations, so you create more work for FB because it has to do extra stuff to process a long in JS.

In the FB parsing, you also parse the buffer and then stick the values into a JSON object. So you are doing extra work that you would typically not perform when using FB. Generally you use work off the buffer and get the data you want when you need it. Perhaps you would put the data into another json object, but I wouldn't count that part as part of the FB parsing.

In the FB creation, you are recreating the builder each iteration, which will have overhead of allocating memory for the builder. I would construct one builder and pass it into the createMessage method, and just "clear" it each loop. That would be more typical of real-world usage.

new flatbuffers.Long(index, index) is also incorrect. The two parameters are the upper and lower bytes of the long, and should not be the same value.

gagangoku commented 4 years ago

@dbaileychess : thanks for the pointers, will share the same loadtest with more fields and long fix

devnoname120 commented 4 years ago

@gagangoku I'm interested in this. Were you able to do another benchmark?

OpenMIS commented 1 year ago

有时所有的业务系统都在使用FlatBuffers,浏览器端如果想要对接,也只能使用FlatBuffers。 1] 可以服务端先把FlatBuffers二进制反序列化成对象,然后调用unpack方法获得对应的实体对象。 然后再调用当前编程语言的Json序列化库,把这个实体对象序列化成Json,再传给浏览器端。 浏览器端收到Json解析成对象,然后 as FlatBuffers的T结尾的实例类,这样就有自动提示各属性了。 1} 支持unpack的编程语言:c#、java、rust、ts、C++、dart、go、python、swift 2} 不支持unpack的编程语言:kotlin、lobster、lua、nim、php 2] Json转FlatBuffers麻烦些,先Json字符串反序化成T结尾的实体类, 然后使用等价于.NET 反射技术,动态调用常规创建FlatBuffers对象的方法, 动态把T结尾的实体类实例里对应的属性作为参数传入。

Sometimes all business systems use FlatBuffers. If the browser wants to connect, it can only use FlatBuffers. 1] The server can first deserialize the FlatBuffers binary into objects, and then call the unpack method to obtain the corresponding entity objects. Then call the Json serialization library of the current programming language, serialize the entity object into Json, and then send it to the browser. The browser receives an instance class that Json parses into an object, and then the instance class at the end of T of as FlatBuffers. In this way, it automatically prompts each attribute. 1} Support unpack programming languages: c #, java, trust, ts, C++, dart, go, python, swift 2} Programming languages that do not support unpack: kotlin, lobster, lua, nim, php 2] It is more troublesome to convert Json to FlatBuffers. First, reverse the Json string into an entity class at the end of T, Then use the equivalent of NET reflection technology, dynamically calling the method of creating the FlatBuffers object, The corresponding attribute in the entity class instance at the end of T is dynamically passed in as a parameter.