Open samreid opened 6 years ago
I emailed @kathy-phet and @ariel-phet
Kathy & Ariel,
May I please have a little time to investigate a strategy for speeding up our tests by parallelizing? Context is https://github.com/phetsims/aqua/issues/52
Best Regards, Sam
@samreid what is your estimate for an initial investigation?
I am guessing around 2 hours to set up initial harness to get unit tests running in puppeteer using async/await. I'll be learning how to set up Promises and run them in parallel and would benefit from a mentor (<1 hour), if someone has time to help on that step.
If that step goes well, the next step would be to add fuzz tests.
@samreid take 4 hours to begin and lets see where you get. Automated testing is certainly quite beneficial to the project, so worthy of some investment in improving efficiency.
Check with @chrisklus or @jbphet regarding Promises.
Notes from discussion with @jonathanolson:
bayes CT: all tests take around 4 hours. it runs 9 at a time, so takes around 4/9 hour = 27 minutes to run all tests for one snapshot.
We considered send set of shas to bayes, then it can run tests on those.
Focus on problems that could develop PhET developer workflow. Bayes CT can catch lower-frequency issues (like building scenery). (0) grunt lint-everything (1) We should create intelligent subsets of tests that we can run anyways.
Major disadvantage of this approach, it only tests chromium. Does not test mobile safari, ie11, firefox, etc. Or is there a way to reuse the code above, but make it work with multiple browsers?
But puppeteer could still be good to get started for local testing.
It would be good if it can run the most important tests first. Focus on finding an issue fast. Interleave the tests, in case there is a phet-io problem--don't save all the phet-io tests until the end.
A heads up to @zepumph that some SimTests are moving to aqua/js/local/test.js
@samreid I noticed the following in https://github.com/phetsims/aqua/issues/52#issuecomment-428324754:
Major disadvantage of this approach, it only tests chromium. Does not test mobile safari, ie11, firefox, etc. Or is there a way to reuse the code above, but make it work with multiple browsers?
I found a couple references that might be helpful:
Hope it's useful!
Questions about how to integrate this:
grunt test
in sun
would run sun tests and its dependencies. Or do we want a more "one size fits all" approach? Should we enumerate a working set of sims that we are careful not to break on commits?After approximately 4 hours, I'm at a good stopping point and pleased with how this is working out so far. I'd like to share this with the dev team at dev meeting and get recommendations on how to proceed.
@jonathanolson asks if we should subset tests or if we should order so we can quit tests early. @samreid said that since Bayes CT is already testing in full, this could be a subset so that we don't interfere with other developers when making common code changes. We could have a short list of sims to test that could be updated regularly. Maybe these sims are ones that are nearing publication.
@samreid ran all tests for fradays-law and graphing-quadratics and the tests completed in about 90 seconds.
@jonathanolson said that for more disruptive changes he still wants to do snapshot tests before/after to compare.
@mbarlow12 said that using a tool like puppeteer is the only way to send proper keyboard events for a11y automated testing.
@jonathanolson and @mbarlow12 gave votes for running the tests with grunt
.
@ariel-phet said @samreid could continue working on this for another day because this seems generally useful for the project.
Feature request: If you do end up integrating it with grunt, a cool option could be to pass in the sim you were mainly work on, then it will test everything on that, while still doing the other tests in whatever smart "pick and choose" way we decide is best. If you omit the option, it just does the pick/choose "random" stuff anyways.
This may be too complicated, but if you run grunt tests
in a repo, it could test that repo and all the downstream dependencies. For instance, if you run grunt tests
in a sim repo, it would only run that sim's tests & fuzzing. Likewise, if you run grunt tests
in sun, it would test all sims that use sun, but it wouldn't test scenery (since that is upstream of sun). This seems kind of opposite what we would expect--testing a repo should test all of its dependencies (to make sure the repo is good), not all of its usages. Maybe we need upstream and downstream tests?
After the preceding commits we have:
lint-everything
I've set this up to automatically run on every commit. Putting issue on hold while I use this in practice for a while.
Could you please provide basic steps to run this. I tried the following and nothing worked:
cd aqua; ./scripts/auto-tests-local.sh
--> Error: Cannot find module 'puppeteer'
cd aqua/scripts; ./auto-tests-local.sh
--> Error: Cannot find module 'C:\Users\Michael\PhET\git\aqua\scripts\js\local\test.js'
cd friction; ../aqua/scripts/auto-tests-local.sh -->
Error: Cannot find module 'C:\Users\Michael\PhET\git\friction\js\local\test.js'`cd friction; grunt tests
cd aqua; grunt tests
In the preceding commit, I added puppeteer to aqua package.json, it will need npm install
. Then your step (1) is worth a try, but I haven't tested it on windows at all.
To run all unit tests, you can use:
node js/local/test.js 123 1 0 UNIT
To run a fuzz test (randomly picks one sim at the moment), you can use:
node js/local/test.js 123 1 0 FUZZ
Sorry for the wonky syntax--working on something amenable to parallelization.
Not planning more development on this at the moment, but we should revisit it when we spend more time on automated testing. Self-unassigning for now.
I removed the auto-test-local script since we have been focusing on precommit hooks.
As suggested in https://github.com/phetsims/binder/issues/8#issuecomment-389249567
My first motivation is to make tests run in parallel. Puppeteer may be able to do that with client code that parallelizes, or we may benefit from something like: https://github.com/thomasdondorf/puppeteer-cluster
The end goal is to have something that can run easily/automatically and run a large amount of tests quickly, with a uniform text report. Cross platform, something we may switch Bayes CT over to at some point.