50 shades of red - Githubissues

leeoniya commented 8 years ago

all red buckets are not created equal, though most green and yellow are. looking at [1], "select row" 13.38ms is the same color as 73.94ms.

it would be useful to have the color scales be tuned to be statistically meaningful within each test. 4300ms and 500ms are very different levels of unacceptable when the baseline is 207.55ms :)

[1] https://cdn.rawgit.com/krausest/js-framework-benchmark/4e47158a10e52122bf4244724cf99b6de4ef7ac1/webdriver-java/table.html

krausest commented 8 years ago

Would you like that better? Factors between 1 and 1.5 are absolutely fine and thus colored green, anything below factor 2.5 is acceptable and thus yellow, anything else bad which is red. Yes and why not some shading for in between ;-) (BTW all values within 1 frame i.e. below 16 msecs are treated equal since I didn't want to penalise frameworks using RAF for repaint.)

krausest commented 8 years ago

Or maybe some continuous coloring (factor 2 is yellow, factor 4 and above is red) like here?

leeoniya commented 8 years ago

last version looks great 👍

EDIT: though i'm not sure it's 100% right. "select row" baseline is 3ms but it doesnt start turning yellow-ish till around 18ms. perhaps this is the rAF threshold you mentioned. it seems like there should be more differentiation between 3ms and 18ms even if they're all within approx 1 frame. this is heavily dependent on the task at hand and having an absolute 16ms "all-green" threshold will give incorrect scales for longer-running tasks. a % factor across the board would be more uniform. perhaps there should be a % factor that's weighted per test so that a 2x factor is pink at 100ms baseline, red at 200ms baseline tasks but green for 10ms tasks. some kind of weight curve should be derived from each test's baseline.

leeoniya commented 8 years ago

So here's a quickie that can be tweaked to adjust the scale. You can run it in console against the generated html table to recolor it. It uses a straight factor without a 16ms threshold, which helps highlight the really fast libs but maybe exaggerates the importance of large factors in sub-frame timings.

// generated via http://www.strangeplanet.fr/work/gradient-generator/?c=16:F44336:FFEB3B:4CAF50
var scale = {
    "#4CAF50": 1.1,
    "#65B74D": 1.2,
    "#7FC04A": 1.3,
    "#98C847": 1.4,
    "#B2D144": 1.5,
    "#CBD941": 1.6,
    "#E5E23E": 1.7,
    "#FFEB3B": 1.8,
    "#FDD63A": 1.9,
    "#FCC139": 2.0,
    "#FAAC39": 2.2,
    "#F99738": 2.4,
    "#F88237": 2.8,
    "#F66D37": 3.2,
    "#F55836": 3.6,
    "#F44336": 4.0,
    // plus http://www.strangeplanet.fr/work/gradient-generator/?c=6:F44336:A82222
    "#E43C32": 4.5,
    "#D5352E": 5.0,
    "#C62F2A": 5.5,
    "#B72826": 6.0,
    "#A82222": 6.5,
}

function findColor(factor) {
    for (var col in scale)
        if (factor < scale[col])
            return col;
    return Object.keys(scale).pop();
}

function findBase(means) {
    var min = 1e6;
    for (var j = 0; j < means.length; j++) {
        var mean = +means[j].textContent;
        if (mean < min)
            min = mean;
    }
    return min;
}

var trs = document.querySelectorAll("tr");

for (var i = 1; i < trs.length; i++) {
    var means = trs[i].querySelectorAll(".mean");
    var base = findBase(means);
    for (var j = 0; j < means.length; j++) {
        var mean = +means[j].textContent;
        var factor = mean / base;
        means[j].parentNode.style.backgroundColor = findColor(factor);
    }
}

recolor-chart

leeoniya commented 8 years ago

if you ignore vanillajs and compare only fastest means between frameworks...

recolor-chart2

krausest commented 8 years ago

I'm somehow reluctant to do that for sub 16 msec actions. Here's a picture from vidom:

Vidom is a RAF-based framework and would be measured with 11 msecs in that case, since I'm starting with the event and stopping after the paint event finished. The question is whether that is fair if it actually actively consumes 8.8 msecs and waited for the rest. (To be honest: I had to search for such a large gap, but the principle holds). When I developed the benchmark it was important to me to measure the actual duration for an user and not just some javascript duration. And for the user anything below 16 msecs is simply fast enough. I don't like to penalize RAF based frameworks because using RAF has its share of advantages. The whole issue is negligible for longer running test cases and for sub 16 msecs tests the only proposition of that benchmark is "fast enough".

leeoniya commented 8 years ago

fair enough, though it may be useful to also include the alternate version for library authors, since it more clearly shows where there's work to be done.

krausest / js-framework-benchmark

50 shades of red #35