pouchdb / pouchdb

:kangaroo: - PouchDB is a pocket-sized database.
https://pouchdb.com/
Apache License 2.0
16.77k stars 1.47k forks source link

Introduce Performance / Benchmarking Suite #113

Closed daleharvey closed 10 years ago

daleharvey commented 12 years ago

I dont want to be overoptimising this early in the project, but understand basic performance characteristics and being able to identify regressions would be very useful, it doesnt need to be complicated

--- Want to back this issue? **[Place a bounty on it!](https://www.bountysource.com/issues/796275-introduce-performance-benchmarking-suite?utm_campaign=plugin&utm_content=tracker%2F52197&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F52197&utm_medium=issues&utm_source=github).
daleharvey commented 11 years ago

So after a bit of a refactor in how the unit tests are done, and some input from @hassy I think I know what we want the tests to look like. Performance tests should be single standalone html pages, they can expect a pouch.alpha.min.js file and a couchdb instance running

They should start by clicking a button with id="start", and when they are completed they

  1. Should store their results into the running CouchDB |new Pouch('http://127.0.0.1:2020/perf-results')|
  2. Should add the following attributes to the body tag data-results-id="id_of_results_doc_saved_to_couch" and class="complete"

This lets us instrument the performance tests in CI so we can keep track of regressions etc, I will take a look at getting the performance stuff @hassy wrote into this format (unless you / anyone else wants to beat me to it :))

hassy commented 11 years ago

I've just updated pouch-perf with the changes you outlined above. It still depends on benchmark.js and rickshaw.js -- the latter is only used to chart the results and can be removed easily.

daleharvey commented 11 years ago

Yeh sorry when I said 'standalone' I meant doesnt need any scrpting deps etc, Having some libraries bundles is definitely fine, I tested out that repo last night and its perfect for what we need. Cheers. when you get time could you change the src to ../../../pouch.alpha.min.js and do a PR to put it inside /tests/performace/flashcards/

Then I can get the instrumentation setup from there

hassy commented 11 years ago

Before I do this - there's still a dependency on having a dataset in Couch to replicate from to set up the test - is that ok or should test data (a couple of MBs of JSON basically) be inlined in the HTML file?

daleharvey commented 11 years ago

I have added a database up at http://pouchdb.iriscouch.com/perf-flashcards_tests, could you load the data into that and use that to replicate the data from? My main issue is that someone should be able to checkout the repo, start the dev env and go to localhost/tests/performance/flashcards and run the tests without any more setup on their end

daleharvey commented 11 years ago

I will make it read only once the data is in there so we dont have people screwing with our test data sets :P

daleharvey commented 11 years ago

We have our first performance tests in there now, needs some work to get running under CI nicely which I will do once the grunt support comes in

briantoth commented 11 years ago

Reviving this in preparation for working on #99. I took a look through the existing flashcard code and it appears to be rather behind (I guess it never got into the CI). Most glaringly, it seems to have a dependency on jquery. Benchmark.js itself seems to be very easy to work with though, so getting performance testing for Pouch will not be hard.

Any guidance on how to best do this in order to play nicely with Travis?

daleharvey commented 10 years ago

https://github.com/daleharvey/pouchdb/tree/fuzz-replication has some WIP for this

daleharvey commented 10 years ago

So, bumping this with some ideas

The hard part about this isnt the performance tests, we have had 2 or 3 seperate suites before, the hard part is making them part of the workflow, performance tests generally take a long time and arent useful to general developers unless they are specifically working on performance, we want to keep npm test fast so so to make performance tests useful they need to be very well integrated with our CI system, travis

I think we want 2 things, a page with pretty graphs, I think on each commit in travis we want to run certain metrics, store them in couch somewhere, then graph them like http://arewefastyet.com/

The other thing we want is a reporter, either on github issues or likely inside travis, that compares pull requests to the latest commit on master and reports any regressions that the pull request causes

I suggest the first thing we do is try to graph our coverate metrics, if you run COVERAGE=1 npm test you will get

=============================================================================
Writing coverage object [/Users/daleharvey/src/pouchdb/coverage/coverage.json]
Writing coverage reports at [/Users/daleharvey/src/pouchdb/coverage]
=============================================================================

=============================== Coverage summary ===============================
Statements   : 63.44% ( 2285/3602 )
Branches     : 62.49% ( 1178/1885 )
Functions    : 62.46% ( 351/562 )
Lines        : 63.44% ( 2275/3586 )
================================================================================

We want to run that in travis, then upload that .json file to somewhere, to test you can just make an instance of https://www.couchappy.com/cloud-based-database-free-nosql-dbaas-provider or something, once we have it working we will set up a secure couch instance for this, then we write some code which graphs the json :)

daleharvey commented 10 years ago

One important thing for testing this, you can enable travis on your own fork, this repo already has it setup so you just need to enable it, more details @ http://docs.travis-ci.com/user/getting-started/

daleharvey commented 10 years ago

also relevant - http://t.co/asFifG9bWI

nolanlawson commented 10 years ago

Another fine example to emulate: http://lodash.com/benchmarks. I really like the idea of comparing different Pouches against each other (e.g. http vs. local, version 1.1.0 vs 2.0). It's the only way I'll be able to prove that leveldown-based websql and idb won't be fast enough (#1250). :wink:

daleharvey commented 10 years ago

coveralls.io was mentioned for this, but I dont think its a good plan, the main thing we are looking for is a snapshot + history of performance, I mentioned coverage because we already have the data there so we can setup the storing data / graphs etc then once that it working, start adding performance tests

As for server provisioning, I suggest just setting up an instance with cloudant, pouch-tests or something similiar

nolanlawson commented 10 years ago

Agree with Dale, having perf data for each new commit would be sweet.

BTW just as a warning to anyone else: if you try to use the lodash benchmarks I mentioned (and its associated perf.js/benchmark.js libraries), you'll paint yourself into a corner because the library only tests synchronous code and was not designed to test async methods where the CPU cycles are mostly being run outside of JavaScript (e.g. websql/idb).

NickColley commented 10 years ago

Topcoat display their performance data nicely. http://bench.topcoat.io/

daleharvey commented 10 years ago

the coverage.json data is fairly useless for graphing, anyone know how to turn it, or get instanbul to spit out an easily readable summary, I get

Statements   : 40.74% ( 1546/3795 )
Branches     : 32.1% ( 643/2003 )
Functions    : 36.88% ( 222/602 )
Lines        : 40.7% ( 1538/3779 )

at the terminal, would be nice to have that in json, preferably without parsing text, @calvinmetcalf you use istanbul, got an idea?

calvinmetcalf commented 10 years ago

d746cf653d1a4d6e7be3491acd87cb0c42ddf554

nolanlawson commented 10 years ago

@calvinmetcalf This issue is about performance, not coverage, right?

daleharvey commented 10 years ago

yup, coverage was just done first since we already had data there, and we would like coverage + the infrastructure is mostly the same

daleharvey commented 10 years ago

So this is like, introduced :)