Adding differences detection

dmonllao commented 11 years ago

Adds:

Info about big differences when comparing runs using the web user interface, showing green and red cool colours
New php CLI script that returns an exitcode according to the differences it finds

New settings to manage it:

defaults to 4% for grouped and 2% single steps
specify a whole json array with specifics for everything

stronk7 commented 11 years ago

Hi David, it looks great. Just some tiny details (they apply both to this request and to the master one):

1) Why in the defaults file, sometimes you are "export"-ing the variables and others not? IMO all should be the same. Or they are needed or they aren't.

Other than that… I'm going to give it a run to see it in action… if everything goes smooth I'll proceed to integrate it, no matter of the 1) above for your consideration.

stronk7 commented 11 years ago

Offtopic comment, for your info: I'm planning to add a job to the laptop server (for now), that will run comparisons between branches weekly (along the weekend, surely). For now, the only comparison that it will be doing is:

base and before: sticky to current 25_STABLE.
after: dinamically, each week, master.

Once we have 26_STABLE I'll be adding new combinations (25 vs 26, 26 vs master…). Just to have another place performing at least weekly runs.

In order to guarantee stability, I'm going to use the "Exclusive execution" plugin that:

wait all current jobs in the jenkins server to finish.
put the server into "shutdown mode" so no new jobs will be allowed.
executes the "exclusive" job.

Not sure if you are using it in the nightly. But it's nice if everything runs in the same jenkins and you want to prevent it to run other jobs to be executed.

Ciao :-)

stronk7 commented 11 years ago

Ok, apart from the problems commented @ https://github.com/moodlehq/moodle-performance-comparison/issues/19, I've been able to complete a compare.sh execution.

All looks good, but the "lang" used to communicate the results. There are two options IMO:

1) or you use "improved" (green) and a better alternative to current "decreases" (red) like "regressed", "deteriorated", "worsen", "ruined"… better expressing the negativity. 2) or you say it more related with quantities, like "reduced" (green) & "incremented/grown" (red). Aka, "less is good", and "more is bad" for all the variables.

The rationale is that, at least in my mind, the current "improved" and "decreases" are not proper antonyms in this context. A "decrease", if understood as a reduction, means exactly the opposite of its current red status. Specially if you read it together with the previous line, like "serverload decreases"…. At least that's what I did and found it not cleat enough or contradicting.

stronk7 commented 11 years ago

Also, just tried the client, and I'd recommend to show information like:

1) A (reduced) title with just name, branch and commit: "Comparing: "before - 25 - 1115bb3 weekly release 2.5.2+" with "after - 26 - 34fb354 on demand release 2.6beta" 2) A blank line. 3) Simple lines with the variations over threshold: improvement: dbreads - All steps data combined -> 3.72% better improvement: memoryused - All steps data combined ->14.47% better regression: memoryused - View course participants -> 4.05% worse …. …. Just for readability. For example, I passed the runs to the script in the wrong order and got the opposite results and was really hard to detect that on output.

Also, the client is borked right now, always returning exist status = 0, because there is a plural $changes['regressions'] near the end, should be "regression" if i'm not wrong.

And that's all. Nice stuff! Awaiting for next iteration.

stronk7 commented 11 years ago

One more little detail… if the "grouped threshold" is set to 4% (default value)… why am I getting "dbreads - All steps data combined" with just a 3.72% difference reported?

Link: http://ci.stronk7.com:8787/~stronk7/perf/?timestamps[]=1382615534503&timestamps[]=1382614466808

dmonllao commented 11 years ago

Many thanks for the review Eloy, all done.

I've also changed the way errors are handled, I was checking the exitcode of many commands as a conditional, I've changed all those conditionals and now all the scripts works as expected even using set -e, if something fails we provide a friendly error message, if this is not enough for the user, set -e can be uncommented for further info.

https://github.com/moodlehq/moodle-performance-comparison/pull/17#issuecomment-26982020
I removed all exports, there is no need to use them as they are only used in this tool's scripts, all of them includes the required properties files
https://github.com/moodlehq/moodle-performance-comparison/pull/17#issuecomment-26982801
Thanks for the info, I've added it to nightly01
https://github.com/moodlehq/moodle-performance-comparison/pull/17#issuecomment-26991528
I swapped improvement/regression for decrease/increase as they seems the closer word + anotonym
https://github.com/moodlehq/moodle-performance-comparison/pull/17#issuecomment-26994742
Grouping by runs as suggested, I swapped the proposed var/status line (web UI) for "dbreads: decrease - All steps data combined -> 3.72%", IMO it seems more natural to begin with the var, as the styled line also gives info about if is a increase/decrease.
https://github.com/moodlehq/moodle-performance-comparison/pull/17#issuecomment-26997970
It was a bug, groupedthreshold and singlestepthreshol values where switched at the end of report::get_default_threshold(), it is solved now

There is also a patch for #17, I haven't really backtraced the issue tough, it seems that the previous file couldn't be overwriten, we can leave #17 open just in case we see mor extrange behaviours.

https://github.com/moodlehq/moodle-performance-comparison/issues/14 is still pending, working on it after this.

I've also updated MOODLE_25_STABLE pull request

stronk7 commented 11 years ago

Hi David, I've looked to code and run it again and is looking great. So I think this can be considered for merging, yay! I suppose that the master branch is 100% the same (please confirm).

Anyway, before pressing the green button (merge pull request), I wanted to share some points, for you to consider if want to fix/comment any of them here:

P1) Minor detail: The ordering in the perf main page is different from the order of the dropdowns in the details page. See:

Main page (newer to older): http://ci.stronk7.com:8787/~stronk7/perf/?timestamps[]=1383003122770&timestamps[]=1383002046619
Details (older to newer): http://ci.stronk7.com:8787/~stronk7/perf/details.php?before=1383002046619&after=1383003122770 IMO we should normalize that, surely "older to newer" is better. Groups are also shown that way.

P2) In the details view, same dropdowns... something is going wrong (see the link above). There are some "unknow" and also, all the runs show the SAME date (Oct 29th) and 2 of them should be Oct 24th.

P3) It seems than.. when some of the counters is 0 (for example, DB writes in a lot of test pages under 2.5...) they are ignored for calculations... surely because there is some division by 0 or so. I'd consider changing any 0 by something else, say 0.1. In order to get them considered. Surely that will lead to some regressions going really higher but, well, if previously we had 0 writes and now we have 150... it has no sense to say that the regression is 857%, it's really higher. Later we can consider printing any % > 10.000 as "infinite" or "∞" (just an idea).

P4) Also, not sure... but the new table results should match the results shown for combined stats... if so... why the later shows +860.2% for combined db writes (instead of the 857% in the new table)?

P5) Related with that... are we really performing so many extra writes in 2.6 ? Wow.

P6) Apart from that, I've understood that you were changing to "improvement" and "regression", but it seems that finally "decrease" and "increment" are the picked ones. np with them, just asking to be 100% sure.

P7) The commit messages you are using really suck, really, really, lol. Plz try to improve them a bit. The original one was passable, but the new ones... (#19, "- blah, blah..."). I'd vote for squashing but it's up to you to decide.

And that's all. Give me some replies or create new issues for whatever you consider above and I'll be happy to merge this and its master alternative (#18, I think).

dmonllao commented 11 years ago

Hi Eloy, thanks again

Yes, master was cherrypicked from MOODLE_25_STABLE (or the opposite, I can't remember)
p1, p3 and p4) The details view is the previous tool outputs, IMO it is expensive to maintain both outputs and we should stay with the new one, I added the link because is a pity that we have data that we don't show, I'd like to have your opinion about maintaining it or adding whatever we need to the new tool, anyway I think that probably would be better to work on these issues in a separate ticket
p3) The 0 divisions are controlled https://github.com/dmonllao/moodle-performance-comparison/commit/3f61ec69357124414cbd526bcf80498f54ecf756#diff-230248c11d0fc28a540d263550c817a4R421 I added a var to skip false negatives (currently set to 2) to avoid reporting when the difference is from (or to) 0 to (or from) N, we can decrease it as much as we want, any preference? https://github.com/dmonllao/moodle-performance-comparison/commit/3f61ec69357124414cbd526bcf80498f54ecf756#diff-230248c11d0fc28a540d263550c817a4R16
p5) Mainly after the new sessions management changes, we are updating the timemodified in mdl_sessions every 20 seconds... https://github.com/moodle/moodle/blob/master/lib/classes/session/manager.php#L328-L344 I already commented in the issue about it https://tracker.moodle.org/browse/MDL-31501?focusedCommentId=245981&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-245981
p6) Yes
p7) lol! I agree, I will squash them all, 1 sec please

stronk7 commented 11 years ago

Thanks, David. I'm going to give it a try and will be merging it. About our chat, some comments, all them surely should go to new issues to be discussed and agreed:

1) What happens with the old detailed view. My current opinion is that the old one shows the different "tests" in a way clearer way. You can see every one independently with totals, grouped... Anyway, I'd create an issue about to decide if we want to keep it or no (if kept, will need maintenance, of course).

2) Related with the above... how to improve the new page. IMO there are things the should be improved. For example, the legends in the grouped charts when the number of steps grow (and they are going to grow). Or showing the +-% of difference on each graph. I'd create a separate issue about that.

3) Ability to export all data. Just guessing if there is a use case for that. To export information to a CSV file or so. 1 line by test (not by test plan).

4) Consider to add some extra information to runs. In order to be able to compare different os, web servers, databases, php versions... we should be able to grab, on each run, that information. Note it's information to be fetched from the server. That way we could compare performance between mysql and postgres, or php5.3 vs 5.5 or linux vs windows... or whatever. And the only way to do so is to know against which server each run was done. Right now we lack that completely. those new "variables" should not affect bases nor groupings... but perhaps should be searchable (in the form) and also readable (in the list of runs). Again, please fill a new issue for that.

And those are all my thoughts. Going to give a try to your latest code now... thanks!

stronk7 commented 11 years ago

About to merge this now... please don't forget the 4 points in my previous comments and also P1) Suggestion to order from older to newer in the main page list.

stronk7 commented 11 years ago

done, thanks!

dmonllao commented 11 years ago

Thanks Eloy, I've created related issues

moodlehq / moodle-performance-comparison

Adding differences detection #17