phetsims / aqua

Automatic QUality Assurance
MIT License
2 stars 4 forks source link

Run tests in puppeteer #52

Open samreid opened 6 years ago

samreid commented 6 years ago

As suggested in https://github.com/phetsims/binder/issues/8#issuecomment-389249567

It crossed my mind that parallel puppeteer may be significantly better than sequential aqua--maybe at some point we should redo aqua to use puppeteer?

My first motivation is to make tests run in parallel. Puppeteer may be able to do that with client code that parallelizes, or we may benefit from something like: https://github.com/thomasdondorf/puppeteer-cluster

The end goal is to have something that can run easily/automatically and run a large amount of tests quickly, with a uniform text report. Cross platform, something we may switch Bayes CT over to at some point.

samreid commented 6 years ago

I emailed @kathy-phet and @ariel-phet

Kathy & Ariel,

May I please have a little time to investigate a strategy for speeding up our tests by parallelizing? Context is https://github.com/phetsims/aqua/issues/52

Best Regards, Sam

ariel-phet commented 6 years ago

@samreid what is your estimate for an initial investigation?

samreid commented 6 years ago

I am guessing around 2 hours to set up initial harness to get unit tests running in puppeteer using async/await. I'll be learning how to set up Promises and run them in parallel and would benefit from a mentor (<1 hour), if someone has time to help on that step.

If that step goes well, the next step would be to add fuzz tests.

ariel-phet commented 6 years ago

@samreid take 4 hours to begin and lets see where you get. Automated testing is certainly quite beneficial to the project, so worthy of some investment in improving efficiency.

Check with @chrisklus or @jbphet regarding Promises.

samreid commented 6 years ago

Notes from discussion with @jonathanolson:

bayes CT: all tests take around 4 hours. it runs 9 at a time, so takes around 4/9 hour = 27 minutes to run all tests for one snapshot.

We considered send set of shas to bayes, then it can run tests on those.

Focus on problems that could develop PhET developer workflow. Bayes CT can catch lower-frequency issues (like building scenery). (0) grunt lint-everything (1) We should create intelligent subsets of tests that we can run anyways.

Major disadvantage of this approach, it only tests chromium. Does not test mobile safari, ie11, firefox, etc. Or is there a way to reuse the code above, but make it work with multiple browsers?

But puppeteer could still be good to get started for local testing.

It would be good if it can run the most important tests first. Focus on finding an issue fast. Interleave the tests, in case there is a phet-io problem--don't save all the phet-io tests until the end.

samreid commented 6 years ago

A heads up to @zepumph that some SimTests are moving to aqua/js/local/test.js

mbarlow12 commented 6 years ago

@samreid I noticed the following in https://github.com/phetsims/aqua/issues/52#issuecomment-428324754:

Major disadvantage of this approach, it only tests chromium. Does not test mobile safari, ie11, firefox, etc. Or is there a way to reuse the code above, but make it work with multiple browsers?

I found a couple references that might be helpful:

Hope it's useful!

samreid commented 6 years ago

Questions about how to integrate this:

  1. Should it run from a grunt task? How will we get something that runs both this and lint/build?
  2. Should it parallelize by default?
  3. How do we choose which tests to run? Should we have different tests for each repo? For instance, running grunt test in sun would run sun tests and its dependencies. Or do we want a more "one size fits all" approach? Should we enumerate a working set of sims that we are careful not to break on commits?
  4. I commented out SimTests.js that were breaking. We need to re-enable those tests for Bayes CT or find another way to run them.

After approximately 4 hours, I'm at a good stopping point and pleased with how this is working out so far. I'd like to share this with the dev team at dev meeting and get recommendations on how to proceed.

jessegreenberg commented 6 years ago

@jonathanolson asks if we should subset tests or if we should order so we can quit tests early. @samreid said that since Bayes CT is already testing in full, this could be a subset so that we don't interfere with other developers when making common code changes. We could have a short list of sims to test that could be updated regularly. Maybe these sims are ones that are nearing publication.

@samreid ran all tests for fradays-law and graphing-quadratics and the tests completed in about 90 seconds.

@jonathanolson said that for more disruptive changes he still wants to do snapshot tests before/after to compare.

@mbarlow12 said that using a tool like puppeteer is the only way to send proper keyboard events for a11y automated testing.

@jonathanolson and @mbarlow12 gave votes for running the tests with grunt.

jessegreenberg commented 6 years ago

@ariel-phet said @samreid could continue working on this for another day because this seems generally useful for the project.

zepumph commented 6 years ago

Feature request: If you do end up integrating it with grunt, a cool option could be to pass in the sim you were mainly work on, then it will test everything on that, while still doing the other tests in whatever smart "pick and choose" way we decide is best. If you omit the option, it just does the pick/choose "random" stuff anyways.

samreid commented 6 years ago

This may be too complicated, but if you run grunt tests in a repo, it could test that repo and all the downstream dependencies. For instance, if you run grunt tests in a sim repo, it would only run that sim's tests & fuzzing. Likewise, if you run grunt tests in sun, it would test all sims that use sun, but it wouldn't test scenery (since that is upstream of sun). This seems kind of opposite what we would expect--testing a repo should test all of its dependencies (to make sure the repo is good), not all of its usages. Maybe we need upstream and downstream tests?

samreid commented 5 years ago

After the preceding commits we have:

  1. A pruned list of unit tests (omit ones that take too long)
  2. One randomly selected fuzz test
  3. A script that subdivides the unit tests across multiple processes
  4. Timing so that unit testing and one fuzz complete in about the same time as lint-everything

I've set this up to automatically run on every commit. Putting issue on hold while I use this in practice for a while.

zepumph commented 5 years ago

Could you please provide basic steps to run this. I tried the following and nothing worked:

  1. cd aqua; ./scripts/auto-tests-local.sh --> Error: Cannot find module 'puppeteer'
  2. cd aqua/scripts; ./auto-tests-local.sh --> Error: Cannot find module 'C:\Users\Michael\PhET\git\aqua\scripts\js\local\test.js'
  3. cd friction; ../aqua/scripts/auto-tests-local.sh -->Error: Cannot find module 'C:\Users\Michael\PhET\git\friction\js\local\test.js'`
  4. cd friction; grunt tests
  5. cd aqua; grunt tests
samreid commented 5 years ago

In the preceding commit, I added puppeteer to aqua package.json, it will need npm install. Then your step (1) is worth a try, but I haven't tested it on windows at all.

To run all unit tests, you can use:

node js/local/test.js 123 1 0 UNIT

To run a fuzz test (randomly picks one sim at the moment), you can use:

node js/local/test.js 123 1 0 FUZZ

Sorry for the wonky syntax--working on something amenable to parallelization.

samreid commented 5 years ago

Not planning more development on this at the moment, but we should revisit it when we spend more time on automated testing. Self-unassigning for now.

samreid commented 4 years ago

I removed the auto-test-local script since we have been focusing on precommit hooks.