Automatic test harness - Githubissues

samreid commented 9 years ago

I'd like our team to have an automated test harness that launches each simulation, runs a mouse/touch fuzzer (or some other way of exercising some of the sim functionality) on it and summarizes the results. It is getting very difficult to know whether common library changes will cause breaking changes in usage sites, and it is time consuming to manually test. Even knowing if the simulation launches without crashing would, in many cases, be very helpful.

Perhaps this can be done through the iframe API? A parent frame that iterates through simulations: for each one, it launches the sim in an iframe and listens for messages that indicate the progress--if messages stall out, then it means the sim encountered a bug. The sims should be run with ?ea

This should be designed so it could be run on any developer machine or a continuous integration machine (the latter may require PhantomJS?).

A way to make this even sweeter would be to add temporary instrumentation into the changed library code so that the parent frame can collect statistics about how many times the modified code was actually hit in each simulation so we can get assurance that it has been tested. Note: the tests described here are mainly to make sure things don't error out and crash--more manual tests would be required to confirm the behavior is correct.

Not sure what repo such a thing should live in (if it should live at all).

jonathanolson commented 9 years ago

It sounds great to put the sims into an iframe with a fuzzer. We'd presumably want to run with a try-catch around events (Scenery integration?) and the animation frame for this, but we don't want that behavior for normal sim runs. This would help a TON for Scenery development. ?eall might be a better option, unless that slows things down too much.

Additionally if possible, I'd love a tool like Selenium for this. See one of the website tests:

    public void testSignIn() throws Exception {
        // checks validation

        String signInFailed = SeleniumUtils.getString( "signIn.validation.failed" );
        String signInTitle = SeleniumUtils.getString( "signIn.title" );

        selenium.open( "/" );
        selenium.click( LogInOutPanel.SIGN_IN_ID );
        loadWithoutError();
        selenium.type( "username", "bogus-email" );
        selenium.type( "password", "bogus-password" );
        selenium.click( "submit" );
        loadWithoutError();
        assert ( selenium.getBodyText().contains( signInFailed ) );
        assertEquals( selenium.getTitle(), signInTitle );
        selenium.type( "username", "test@phet.colorado.edu" );
        selenium.click( "submit" );
        loadWithoutError();
        assert ( selenium.getBodyText().contains( signInFailed ) );
        assertEquals( selenium.getTitle(), signInTitle );
        selenium.type( "password", "test-password" );
        selenium.click( "submit" );
        loadWithoutError();
        assert ( !selenium.getBodyText().contains( signInFailed ) );
        assertNotEquals( selenium.getTitle(), signInTitle );
    }

I'd really love to be able to do a functional test on a sim, record the input events, model changes, and screengrabs. Then the input events could be played back at a later time, verifying the model changes and optionally screenshot comparisons (style changes that affect all sims would throw a wrench into that).

samreid commented 9 years ago

Then the input events could be played back at a later time, verifying the model changes and optionally screenshot comparisons (style changes that affect all sims would throw a wrench into that)

An alternative would be to record a subset of axon Events and Property changes. Then playing those back, we can see how the model and views respond. Should be robust against visual style changes but not against internal rearchitecturing of events/properties.

samreid commented 9 years ago

One way to gauge how a simulation is doing is to record the number of assertions passed vs assertions checked. This could be reported back to the parent frame.

jonathanolson commented 9 years ago

Below is a screenshot of the current development:

joist-test-sims-1

Currently the following sims are throwing assertions or out-right failing according to the latest test (makes it hard to test for common-code changes):

Broken due to Scenery https://github.com/phetsims/scenery/issues/354

balancing-chemical-equations build-an-atom plinko probability

Broken due to WebGL

neuron

Broken due to sim-specific issues

area-builder arithmetic capacitor-lab (doesn't finish loading) color-vision (doesn't finish loading) curve-fitting (doesn't finish loading) fluid-pressure-and-flow (doesn't finish loading) fraction-comparison (doesn't finish loading) fraction-matcher isotopes-and-atomic-mass (doesn't finish loading) projectile-motion (doesn't finish loading) protein-synthesis (doesn't finish loading) sugar-and-salt-solutions (doesn't finish loading)

pixelzoom commented 9 years ago

In case anyone else is wondering how to use this... Here's info from a Skype conversation with @jonathanolson. This is as of 2/24/15.

To run the tests:

Go to joist/tests/ in the console and type node build-server.js
Visit joist/tests/test-sims.html in the browser

All sims that are in chipper/data/active-sims will be tested.

Left to right, the little colored squares are: "run with require.js", "chipper build process/lint", "run built version". It cycles through all require.js versions, then runs the built versions as they are available.

Colors: green = pass, red = fail, orange = failed before finishing loading, gray = test not run

Error messages appear in the right side of the browser window. In order from top to bottom: • "Sim errors (dev)" corresponds to "run with require.js" test (column 1) • "Sim errors (build)" corresponds to "run built version" (column 3) • "Grunt errors" corresponds to "chipper build process/lint" (column 2) Note that they are in a different order than the columns because grunt errors may be ridiculously long.

pixelzoom commented 9 years ago

Recommended to move the build server (stuff in joist/tests/) to chipper.

jonathanolson commented 9 years ago

That sounds good to me, I'll add it to my list.

jonathanolson commented 9 years ago

On that note, I'm not sure about the most natural location for it. chipper/bin makes the most sense, but it's JS. Doesn't fit in grunt/requirejs-plugins directory, but chipper/js (no sub-directory after) could work.

Thoughts @pixelzoom?

pixelzoom commented 9 years ago

chipper/js/build-server/ ?

jonathanolson commented 9 years ago

Like chipper/js/build-server/server.js?

pixelzoom commented 9 years ago

That would be OK. Not sure what we want to call this testing tool longterm, maybe something more specific than build-server, but I don't have a better name.

jonathanolson commented 9 years ago

My understanding was just moving the node.js server, not the front end. Is that what you were mentioning? (It's only 1 file).

pixelzoom commented 9 years ago

When I think of a 'server', I think of something that is running continuously. Is that the intention of this tool? Or is it intended to be fired up when you want to make sure that all sims build and run?

pixelzoom commented 9 years ago

Where would the front end live?

jonathanolson commented 9 years ago

I was using "server" in the "it serves responses to requests" sense, since it opens up a service on a port. It was written so that it could be used either running continuously or whenever we need.

I'm happy to move everything to chipper if that's convenient (including the HTML/JS).

pixelzoom commented 9 years ago

While relocating to chipper, also please rename this, since there is now something else called build-server.

pixelzoom commented 9 years ago

Raising the priority of this because it's languishing.

jonathanolson commented 9 years ago

@aaronsamuel137, @pixelzoom, thoughts on a new name, since chipper 'build-server' is now taken? "automated-test"?

aaronsamuel137 commented 9 years ago

test-server?

pixelzoom commented 9 years ago

When I hear "server", I think of something that runs continuously, not something that is run on demand.

Since this applications builds and fuzz-tests all sims, how about build-and-fuzz? It would also be nice if you could specify a subset of sims to build and fuzz.

jonathanolson commented 9 years ago

When I hear "server", I think of something that runs continuously, not something that is run on demand.

I tend to leave it running on my main desktop, and I could imagine leaving it running on a remote server as well.

pixelzoom commented 9 years ago

I see. So build-server.js runs continuously, and refreshing test-sims.html restarts the tests? What happens if there are tests in progress, or if you have test-sims.html open in >1 browser window? Or if you close the browser window while tests are in progress?

jonathanolson commented 9 years ago

I see. So build-server.js runs continuously, and refreshing test-sims.html restarts the tests?

Yes.

What happens if there are tests in progress, or if you have test-sims.html open in >1 browser window?

It can handle concurrent requests, but doesn't have logic yet to prevent it from trying to build the same sim at the same time from multiple requests.

samreid commented 9 years ago

It keeps cycling through the sims over and over? If I have a cron job to pull latest master will it auto-update in the test-server browser or do things get cached and have to be hard-reloaded?

jonathanolson commented 9 years ago

It keeps cycling through the sims over and over? If I have a cron job to pull latest master will it auto-update in the test-server browser or do things get cached and have to be hard-reloaded?

It doesn't cycle, when you load the test HTML, it will just run through all the sims and stop. You also won't want to pull while it's building a sim. It npm-installs and builds, since that's fairly safe (can have checked-out changes to test).

jonathanolson commented 9 years ago

Moved to chipper. Anything else to do for this issue?

samreid commented 9 years ago

http://localhost:8080/chipper/tests/test-sims.html no longer works, did the URL change?

jonathanolson commented 9 years ago

http://localhost/phet/git/chipper/tests/test-sims.html (my local URL for this computer) is working fine. Can you clarify what "no longer works" means?

samreid commented 9 years ago

It seems to be working now, not sure what was wrong before. @jonathanolson anything else to do here?

jonathanolson commented 9 years ago

I asked you the same question, so I'll go ahead and close it (we can reopen later if necessary).

phetsims / joist

Automatic test harness #210

Broken due to Scenery https://github.com/phetsims/scenery/issues/354

Broken due to WebGL

Broken due to sim-specific issues