stevendesu / jsindex

1 stars 0 forks source link

Research performance options #13

Open stevendesu opened 5 years ago

stevendesu commented 5 years ago

There are a few places in this library that I utilize Array.prototype.filter

I did the following rudimentary test in the Chrome console (Chrome v70.0.3538.77):

var arr = [];
for (var i = 0; i < 1000000; i++) { arr[i] = i; }

// Filter:
var d0 = new Date(); for (var x = 0; x < 100; x++) { arr.filter(el => el % 2); } new Date() - d0;
// Outputs (on my machine): ~2500-2600

// For-loop:
var d0 = new Date(); for (var x = 0; x < 100; x++) { var out = []; for(var y = 0; y < arr.length; y++) { if(arr[y] % 2) out.push(arr[y]); } } new Date() - d0;
// Outputs (on my machine): ~1000-1300

It looks like removing .filter could improve the performance of some algorithms up to two-fold.

Similar research should be done into:

stevendesu commented 5 years ago

(Breaking out research notes into separate comments)

filter vs for-loop

Every test I do in every browser reveals that a for loop is faster than using filter. This is also confirmed on jsperf.com

Looking at the polyfill for Array.prototype.filter on MDN this makes sense:

All of these add a little bit of overhead.

Although while looking at the polyfill I noticed a way we can improve performance beyond my rudimentary example in the first post:

var arr = [];
for (var i = 0; i < 1000000; i++) { arr[i] = i; }

// Array.prototype.filter: ~2500-2600
var d0 = new Date();
for (var x = 0; x < 100; x++) {
    arr.filter(el => el % 2);
}
new Date() - d0;

// For loop with Array.prototype.push: ~1000-1300
var d0 = new Date();
for (var x = 0; x < 100; x++) {
    var out = [];
    for(var y = 0; y < arr.length; y++) {
        if(arr[y] % 2)
            out.push(arr[y]);
    }
}
new Date() - d0;

// For loop with pre-allocated array: ~600-700
var d0 = new Date();
for (var x = 0; x < 100; x++) {
    var out = new Array(arr.length);
    var cnt = 0;
    for(var y = 0; y < arr.length; y++) {
        if(arr[y] % 2)
            out[cnt++] = arr[y];
    }
}
new Date() - d0;

// While loop with pre-allocated array: ~600-700
var d0 = new Date();
for (var x = 0; x < 100; x++) {
    var len = arr.length;
    var cnt = 0;
    var y = -1;
    var out = new Array(arr.length);
    while(++y < len) {
        if(arr[y] % 2)
            out[cnt++] = arr[y];
    }
}
new Date() - d0;

So by pre-allocating the array we can increase the speed yet again, for a net four-fold increase on our loops. Additional improvements (caching the array length, using a while loop, pre-increment instead of post-increment, etc) had no effect in the Chrome V8 engine. Likely what minor differences in these approaches may have existed 10 years ago have been optimized away by the compiler.

stevendesu commented 5 years ago

Object.keys + for vs for-in

I started out experimenting with something like the following:

var letters = "abcdefghijklmnopqrstuvwxyz";
function ranString(len) { var str = ""; for (var i = 0; i < len; i++) { str += letters[Math.floor(Math.random() * letters.length)]; } return str; }

var obj = {};
for (var i = 0; i < 1000000; i++) { obj[ranString(Math.floor(Math.random() * 5 + 1))] = Math.random(); }

// Object.keys + for: ...
var d0 = new Date();
for (var x = 0; x < 10; x++ ) {
    var keys = Object.keys(obj);
    for( var i = 0; i < keys.length; i++ ) {
        if (obj.hasOwnProperty(keys[i]))
            obj[keys[i]] = Math.random();
    }
}
new Date() - d0;

The first few time I ran this it looked like I was getting valid results: 2022, 1876, 2082, 2031...

But then I ran it a few more times: 2142, 2329, 2960, 3764, 4963...

Even though I'm only overwriting the properties of the objects and not allocating new objects, it seems like I triggered a bug in the V8 engine. I waited a solid minute (without refreshing the page or changing anything else) and ran it again: 2120, 2013, 2024, 1969,...

We're back in the range of reasonable values. This suggests the issue lies in the garbage collector somewhere - because it comes and goes randomly.

I'll have to make a note to look into this later and see how I can reproduce it. It'd be a shame if something happened on a production site that brought an app to a crawl.

Since the issue seemed to clear up, I resumed:

// Object.keys + for: ~1900 - 2200
// ...

// For-in: ...
var d0 = new Date();
for (var x = 0; x < 10; x++ ) {
    for( var key in obj ) {
        obj[key] = Math.random();
    }
}
new Date() - d0;

This time I immediately hit our issue (which I still suspect is garbage collection): 2193, 2835, 3682, 5293,...

I waited a solid minute and tried again: 2201, 2918, 3830, 5080,... I immediately ran into the issue that was harder to reproduce with the Object.keys solution.

I waited another solid minute and tried again: 2079, 2396, 3029, 3632, 4579,...

Whatever the issue is, using for in causes it to happen every time, versus Object.keys it's intermittent. It definitely feels like we should stick to Object.keys for now, but further research must be done to understand this bug and how we can avoid it.

stevendesu commented 5 years ago

for-loop + push vs concat

We have several places where we combine arrays (building indexes, filtering, merging, etc):

var arr1;
var arr2;

function resetArrays() {
    arr1 = new Array(1000000);
    arr2 = new Array(1000000);
    for (var i = 0; i < 1000000; i++) {
        arr1[i] = i;
        arr2[i] = i;
    }
}

// for-loop + push: ~3000-3300
var d0 = new Date();
for (var x = 0; x < 100; x++ ) {
    resetArrays();
    for (var i = 0; i < arr2.length; i++) {
        arr1.push(arr2[i]);
    }
}
new Date() - d0;

// concat: ~1800-1900
var d0 = new Date();
for (var x = 0; x < 100; x++ ) {
    resetArrays();
    arr1 = arr1.concat(arr2);
}
new Date() - d0;

// update length + for-loop + property set: ~2000-2100
var d0 = new Date();
for (var x = 0; x < 100; x++ ) {
    resetArrays();
    var offset = arr1.length;
    arr1.length = arr1.length + arr2.length;
    for (var i = 0; i < arr2.length; i++) {
        arr1[offset + i] = arr2[i];
    }
}
new Date() - d0;

I was surprised in this case to find that concat is very efficient. I expected the for loop with pre-allocation to be the best. Knowing that iteratively calling .push() is so bad, though, I'll make a point to remove any such instances from the code.