phetsims / aqua

Automatic QUality Assurance
MIT License
2 stars 4 forks source link

CT is not running some tests #118

Closed pixelzoom closed 3 years ago

pixelzoom commented 3 years ago

As has been reported numerous times in Slack, CT seems to have a serious problem. It's not running all tests. Some of the tests are getting run rarely or never. Given how much PhET relies on CT, this is not good.

To give an example... I saw https://github.com/phetsims/ph-scale/issues/212 occur once in CT. It only occurred once over many CT cycles, so my tendency was to assume that it was a rare occurrence, and could probably be investigated with low-priority, or maybe even ignored. Then I remembered the CT problem, so I decided to test manually. It fails on every manual test, and I suspect it may be a common-code problem in scenery.

I'll start by labeling this as high priority and assign to @jonathanolson.

zepumph commented 3 years ago

Yes I agree. Here a visual to help understand, if you just loaded CT right now, you would not get the systemic nature of what is actually going on. Here is it a first glance:

image

But if you expand everything, you will find that the majority of error cases are just the only completed test in their row for the last X columns:

image

pixelzoom commented 3 years ago

Thanks @zepumph, that's a nice visual description of what's going on. So the same thing that I discovered for phetsims/ph-scale#212 is happening for all sims.

In addition to figuring out why CT is not able to run all tests, perhaps the list of tests should be reviewed, to see whether all of them are necessary.

jonathanolson commented 3 years ago

In addition to figuring out why CT is not able to run all tests, perhaps the list of tests should be reviewed, to see whether all of them are necessary.

There is generally not enough time to run all tests for every snapshot (I think I recently calculated we're at about 35 hours of browser testing per snapshot, which if we're running 9 processes would be closer to 4 hours, and since we test the last two snapshots, would probably take an 8-hour gap in order to run everything in a snapshot, assuming everything is working correctly).

Also, it seems like CT testing has a variable capacity right now, sometimes potentially stalling out partially (which has been hard to debug).

pixelzoom commented 3 years ago

There is generally not enough time to run all tests for every snapshot ...

What would it take to run all tests for every snapshot? I'm not familiar with CT's implementation. But it seems to me that CT needs to be able to scale beyond what it's capable of now. As PhET continues to grow, there will be a need to run more tests.

jonathanolson commented 3 years ago

What would it take to run all tests for every snapshot?

Generally, pointing more browsers at CT should help speed it up. Those browsers WILL increase the load on the HTTP server (if an example sim load at ~700 requests is standard, each snapshot would take about ~1.1 million http requests to complete at the current number of tests).

I'm going to see if this works out by running some browsers from my home.

jonathanolson commented 3 years ago

I've been running one browser tab on https://bayes.colorado.edu/continuous-testing/aqua/html/continuous-loop.html?id=Testing for 30 minutes, and it seems to be operating normally (and recording results in CT).

zepumph commented 3 years ago

I've had success with puppeteerHelpCT, like cd aqua/js/local; node puppeteerHelpCT ZepumphComputer. Perhaps (naively), we could see if it was helpful to have 20 of these links going to unburden the 9 processes on bayes.

jonathanolson commented 3 years ago

@zepumph do you have enough puppeteer knowledge to help if I were to try getting puppeteer working to get Chrome instances running on it?

zepumph commented 3 years ago

I'd be happy to take a look!

jonathanolson commented 3 years ago

Almost all tests were run in the burned-in snapshots. Every single test has run in the last week. I believe this is fixed, closing.