phetsims / perennial

Maintenance tools that won't change with different versions of chipper checked out
MIT License
3 stars 5 forks source link

BUILD ABORTED! TimeoutError: waiting for function failed: timeout 30000ms exceeded #273

Open mattpen opened 2 years ago

mattpen commented 2 years ago

There was a build failure for a gu translation of ESP this week. I verified that the sim file, the screen titles, and the database entry have all been added, so either the translator resubmitted the build successfully or this was somehow eventually correct.

This error likely occured while waiting for puppeteer to parse screen names from the translated sim. If this starts recurring we should consider (a) increasing the timeout for puppeteer or (b) adding in some error handling and retries. This likely just occurred while the server was busy.

For now, I'm just going to wait and see what happens as it seems like this isn't causing significant problems.

mattpen commented 2 years ago

This happened again today, for ESP in rw. It looks like it built and deployed the html file successfully, but did not update energy-skate-park.xml. This is acceptable, since according to babel the screen names and title did not change in this translation.

pixelzoom commented 1 year ago

This happened today for a production build of geometric-optics-basics 1.2.0. This is a new sim, and the first time that grunt production has been run.

mattpen commented 1 year ago

This happened twice on 11/29:

  1. geometric-optics-basics production release (1.2)
  2. energy-skate-park lg translation

@pixelzoom reported on Slack that he redeployed GOB 1.2 and it was successful the second time. I reran the failed translation (ESP in lg) and the build succeeded.

mattpen commented 1 year ago

This happened again today for my-solar-system, for a prototype deploy. A redeploy solved the problem.

mattpen commented 1 year ago

From @zepumph's comment in #290:

This is occurring on the build-server in production ever since we upgraded to 19 over in https://github.com/phetsims/perennial/issues/268.

@jonathanolson recently that my-solar-system failed yesterday. The build-server logs looked like:

Dec 05 15:00:32 phet-server2.int.colorado.edu build-server[3140307]: error: Could not parse screen names from sim: TimeoutError: Waiting failed: 30000ms exceeded
Dec 05 15:00:32 phet-server2.int.colorado.edu build-server[3140307]: error: Stop status: waitForFunction phet
Dec 05 15:00:32 phet-server2.int.colorado.edu build-server[3140307]: error: TimeoutError: Waiting failed: 30000ms exceeded
Dec 05 15:00:32 phet-server2.int.colorado.edu build-server[3140307]:     at Timeout.<anonymous> (/data/share/phet/build-server/perennial/node_modules/puppeteer-core/lib/cjs/puppeteer/common/WaitTask.js:64:32)
Dec 05 15:00:32 phet-server2.int.colorado.edu build-server[3140307]:     at listOnTimeout (node:internal/timers:559:17)
Dec 05 15:00:32 phet-server2.int.colorado.edu build-server[3140307]:     at processTimers (node:internal/timers:502:7)
Dec 05 15:00:32 phet-server2.int.colorado.edu build-server[3140307]: error: BUILD ABORTED! TimeoutError: Waiting failed: 30000ms exceeded
Dec 05 15:00:32 phet-server2.int.colorado.edu build-server[3140307]: error: TimeoutError: Waiting failed: 30000ms exceeded
Dec 05 15:00:32 phet-server2.int.colorado.edu build-server[3140307]:     at Timeout.<anonymous> (/data/share/phet/build-server/perennial/node_modules/puppeteer-core/lib/cjs/puppeteer/common/WaitTask.js:64:32)
Dec 05 15:00:32 phet-server2.int.colorado.edu build-server[3140307]:     at listOnTimeout (node:internal/timers:559:17)
Dec 05 15:00:32 phet-server2.int.colorado.edu build-server[3140307]:     at processTimers (node:internal/timers:502:7)
Dec 05 15:00:32 phet-server2.int.colorado.edu build-server[3140307]: info: build aborted: checking out master for every repo in case build shas are still checked out
Dec 05 15:00:32 phet-server2.int.colorado.edu build-server[3140307]: debug: Running grunt checkout-master-all from .
Dec 05 15:00:32 phet-server2.int.colorado.edu build-server[3140307]: debug: stdout: Running "checkout-master-all" task
Dec 05 15:00:33 phet-server2.int.colorado.edu build-server[3140307]: debug: stdout: error in git checkout master for repo number-compare
Dec 05 15:00:33 phet-server2.int.colorado.edu build-server[3140307]: debug: stdout:
Dec 05 15:00:33 phet-server2.int.colorado.edu build-server[3140307]: debug: stdout: Done.
Dec 05 15:00:33 phet-server2.int.colorado.edu build-server[3140307]: debug: Command grunt finished. Output is below.
Dec 05 15:00:33 phet-server2.int.colorado.edu build-server[3140307]: debug: stderr is empty.
Dec 05 15:00:33 phet-server2.int.colorado.edu build-server[3140307]: debug: stdout: Running "checkout-master-all" task

tagging @mattpen.

The first thing I'm going to try is to load parseScreenNames locally. It seems to be the only puppeteer usage that isn't using puppeteerLoad, so perhaps refactoring to that could solve things. I'm just guessing though.

mattpen commented 1 year ago

@zepumph - this has been occurring since well before #268, the first occurence was December 2021, so probably not related to the upgrade unfortunately. It also only happens occasionally, although it has been increasing in frequency the past few weeks so probably worth bumping the priority. I think it is probably more related to some bottleneck on phet-server2 (although it was occuring on phet-server as well).

zepumph commented 1 year ago

I'll take a first pass this afternoon, the history makes me more confident that using pupeeteerLoad will help things.

jonathanolson commented 1 year ago

Also of note: once my maintenance releases get production deployed, it should help out with the translation builds.

zepumph commented 1 year ago

Using this patch I can test parsing screen names locally. After running 5 times I didn't reproduce anything:

```diff Index: js/build-server/createTranslationsXML.js IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/js/build-server/createTranslationsXML.js b/js/build-server/createTranslationsXML.js --- a/js/build-server/createTranslationsXML.js (revision a638fbb804560b4fc267a4cfca3a385c7d9c0d6e) +++ b/js/build-server/createTranslationsXML.js (date 1670370725124) @@ -55,37 +55,40 @@ const screenNames = await parseScreenNames.parseScreenNames( simName, stringFiles.map( f => f.locale ) ); - for ( let j = 0; j < stringFiles.length; j++ ) { - const stringFile = stringFiles[ j ]; - const languageJSON = ( stringFile.locale === constants.ENGLISH_LOCALE ) ? englishStrings : - JSON.parse( fs.readFileSync( `../babel/${simName}/${stringFile.name}`, { encoding: 'utf-8' } ) ); - - const simHTML = `${constants.HTML_SIMS_DIRECTORY + simName}/${version}/${simName}_${stringFile.locale}.html`; - - if ( fs.existsSync( simHTML ) ) { - const localizedSimTitle = ( languageJSON[ simTitleKey ] ) ? languageJSON[ simTitleKey ].value : englishStrings[ simTitleKey ].value; - finalXML = finalXML.concat( `\n` + - `<![CDATA[${localizedSimTitle}]]>\n` ); - if ( screenNames && screenNames[ stringFile.locale ] ) { - finalXML = finalXML.concat( '\n' ); - screenNames[ stringFile.locale ].forEach( ( screenName, index ) => { - finalXML = finalXML.concat( `\n` ); - } ); - finalXML = finalXML.concat( '\n' ); - } - finalXML = finalXML.concat( '\n' ); - } - } - - finalXML = finalXML.concat( '\n' ); - - const xmlFilepath = `${constants.HTML_SIMS_DIRECTORY + simName}/${version}/${simName}.xml`; - try { - await writeFile( xmlFilepath, finalXML ); - } - catch( err ) { - console.error( 'Error writing xml file', err ); - throw err; - } + console.log( screenNames ); + // + // for ( let j = 0; j < stringFiles.length; j++ ) { + // const stringFile = stringFiles[ j ]; + // const languageJSON = ( stringFile.locale === constants.ENGLISH_LOCALE ) ? englishStrings : + // JSON.parse( fs.readFileSync( `../babel/${simName}/${stringFile.name}`, { encoding: 'utf-8' } ) ); + // + // const simHTML = `${constants.HTML_SIMS_DIRECTORY + simName}/${version}/${simName}_${stringFile.locale}.html`; + // + // if ( fs.existsSync( simHTML ) ) { + // const localizedSimTitle = ( languageJSON[ simTitleKey ] ) ? languageJSON[ simTitleKey ].value : englishStrings[ simTitleKey ].value; + // finalXML = finalXML.concat( `\n` + + // `<![CDATA[${localizedSimTitle}]]>\n` ); + // if ( screenNames && screenNames[ stringFile.locale ] ) { + // finalXML = finalXML.concat( '\n' ); + // screenNames[ stringFile.locale ].forEach( ( screenName, index ) => { + // finalXML = finalXML.concat( `\n` ); + // } ); + // finalXML = finalXML.concat( '\n' ); + // } + // finalXML = finalXML.concat( '\n' ); + // } + // } + // + // finalXML = finalXML.concat( '\n' ); + // console.log( finalXML ); + // + // const xmlFilepath = `${constants.HTML_SIMS_DIRECTORY + simName}/${version}/${simName}.xml`; + // try { + // await writeFile( xmlFilepath, finalXML ); + // } + // catch( err ) { + // console.error( 'Error writing xml file', err ); + // throw err; + // } winston.log( 'info', 'wrote XML file' ); }; \ No newline at end of file Index: js/grunt/Gruntfile.js IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/js/grunt/Gruntfile.js b/js/grunt/Gruntfile.js --- a/js/grunt/Gruntfile.js (revision a638fbb804560b4fc267a4cfca3a385c7d9c0d6e) +++ b/js/grunt/Gruntfile.js (date 1670370622380) @@ -18,6 +18,7 @@ const assert = require( 'assert' ); const _ = require( 'lodash' ); // eslint-disable-line require( './checkNodeVersion' ); +const createTranslationsXML = require( '../build-server/createTranslationsXML.js' ); /////////////////////////// module.exports = function( grunt ) { @@ -360,6 +361,16 @@ await wrapper( repo, noninteractive, grunt.option( 'message' ) ); } ) ); + + + grunt.registerTask( 'testMe', + 'Deploys a phet-io wrapper\n' + + '--repo : The name of the wrapper repository to deploy\n' + + '--noninteractive : If specified, prompts will be skipped. Some prompts that should not be automated will fail out\n' + + '--message : An optional message that will be appended on version-change commits.', + wrapTask( async () => { + await require( '../build-server/createTranslationsXML.js' )( 'my-solar-system', '1.0.0' ); + } ) ); grunt.registerTask( 'dev', 'Deploys a dev version of the simulation\n' + ```

I would use this to test against my commits below:

```diff Index: js/build-server/createTranslationsXML.js IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/js/build-server/createTranslationsXML.js b/js/build-server/createTranslationsXML.js --- a/js/build-server/createTranslationsXML.js (revision a638fbb804560b4fc267a4cfca3a385c7d9c0d6e) +++ b/js/build-server/createTranslationsXML.js (date 1670371383971) @@ -53,39 +53,42 @@ // create xml, making a simulation tag for each language let finalXML = `\n\n`; - const screenNames = await parseScreenNames.parseScreenNames( simName, stringFiles.map( f => f.locale ) ); - - for ( let j = 0; j < stringFiles.length; j++ ) { - const stringFile = stringFiles[ j ]; - const languageJSON = ( stringFile.locale === constants.ENGLISH_LOCALE ) ? englishStrings : - JSON.parse( fs.readFileSync( `../babel/${simName}/${stringFile.name}`, { encoding: 'utf-8' } ) ); - - const simHTML = `${constants.HTML_SIMS_DIRECTORY + simName}/${version}/${simName}_${stringFile.locale}.html`; - - if ( fs.existsSync( simHTML ) ) { - const localizedSimTitle = ( languageJSON[ simTitleKey ] ) ? languageJSON[ simTitleKey ].value : englishStrings[ simTitleKey ].value; - finalXML = finalXML.concat( `\n` + - `<![CDATA[${localizedSimTitle}]]>\n` ); - if ( screenNames && screenNames[ stringFile.locale ] ) { - finalXML = finalXML.concat( '\n' ); - screenNames[ stringFile.locale ].forEach( ( screenName, index ) => { - finalXML = finalXML.concat( `\n` ); - } ); - finalXML = finalXML.concat( '\n' ); - } - finalXML = finalXML.concat( '\n' ); - } - } + const screenNames = await parseScreenNames.parseScreenNames( simName, [ 'en', 'vi', 'th' ] ); - finalXML = finalXML.concat( '\n' ); - - const xmlFilepath = `${constants.HTML_SIMS_DIRECTORY + simName}/${version}/${simName}.xml`; - try { - await writeFile( xmlFilepath, finalXML ); - } - catch( err ) { - console.error( 'Error writing xml file', err ); - throw err; - } + console.log( screenNames ); + // + // for ( let j = 0; j < stringFiles.length; j++ ) { + // const stringFile = stringFiles[ j ]; + // const languageJSON = ( stringFile.locale === constants.ENGLISH_LOCALE ) ? englishStrings : + // JSON.parse( fs.readFileSync( `../babel/${simName}/${stringFile.name}`, { encoding: 'utf-8' } ) ); + // + // const simHTML = `${constants.HTML_SIMS_DIRECTORY + simName}/${version}/${simName}_${stringFile.locale}.html`; + // + // if ( fs.existsSync( simHTML ) ) { + // const localizedSimTitle = ( languageJSON[ simTitleKey ] ) ? languageJSON[ simTitleKey ].value : englishStrings[ simTitleKey ].value; + // finalXML = finalXML.concat( `\n` + + // `<![CDATA[${localizedSimTitle}]]>\n` ); + // if ( screenNames && screenNames[ stringFile.locale ] ) { + // finalXML = finalXML.concat( '\n' ); + // screenNames[ stringFile.locale ].forEach( ( screenName, index ) => { + // finalXML = finalXML.concat( `\n` ); + // } ); + // finalXML = finalXML.concat( '\n' ); + // } + // finalXML = finalXML.concat( '\n' ); + // } + // } + // + // finalXML = finalXML.concat( '\n' ); + // console.log( finalXML ); + // + // const xmlFilepath = `${constants.HTML_SIMS_DIRECTORY + simName}/${version}/${simName}.xml`; + // try { + // await writeFile( xmlFilepath, finalXML ); + // } + // catch( err ) { + // console.error( 'Error writing xml file', err ); + // throw err; + // } winston.log( 'info', 'wrote XML file' ); }; \ No newline at end of file Index: js/grunt/Gruntfile.js IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/js/grunt/Gruntfile.js b/js/grunt/Gruntfile.js --- a/js/grunt/Gruntfile.js (revision a638fbb804560b4fc267a4cfca3a385c7d9c0d6e) +++ b/js/grunt/Gruntfile.js (date 1670374542910) @@ -18,6 +18,7 @@ const assert = require( 'assert' ); const _ = require( 'lodash' ); // eslint-disable-line require( './checkNodeVersion' ); +const createTranslationsXML = require( '../build-server/createTranslationsXML.js' ); /////////////////////////// module.exports = function( grunt ) { @@ -360,6 +361,17 @@ await wrapper( repo, noninteractive, grunt.option( 'message' ) ); } ) ); + + + grunt.registerTask( 'testMe', + 'Deploys a phet-io wrapper\n' + + '--repo : The name of the wrapper repository to deploy\n' + + '--noninteractive : If specified, prompts will be skipped. Some prompts that should not be automated will fail out\n' + + '--message : An optional message that will be appended on version-change commits.', + wrapTask( async () => { + // await require( '../build-server/createTranslationsXML.js' )( 'my-solar-system', '1.0.0' ); + await require( '../build-server/parseScreenNames.js' ).parseScreenNamesAllSimulations(); + } ) ); grunt.registerTask( 'dev', 'Deploys a dev version of the simulation\n' +
zepumph commented 1 year ago

I was able to get to a commit point converting this to use puppeteer. Here are a couple of changes that made it not just a exclusive refactor:

So this change isn't necessarily a fix, but puppeteerLoad does come with a lot more error handling. We may be able to see how things are happening. My guess is that this happens when there is a failure to deliver the webpage from phet.colorado.edu, but I'm not sure.

@mattpen I think the next step is a review and to put it into practice.

mattpen commented 1 year ago

@zepumph - As far as review I'm a bit suspicious of the way Promises are handled in puppeteerLoad.js. Why are we calling resolve( new Error ... instead of reject( new Error ...? I also had trouble following the logic flow with the promise closure. If we used reject instead we could use try/catch blocks instead of type checking the result to see if it is an error.

My suggestion: ```js Index: js/build-server/parseScreenNames.js IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/js/build-server/parseScreenNames.js b/js/build-server/parseScreenNames.js --- a/js/build-server/parseScreenNames.js (revision 162ba71d91d4146d0844ef56cb629d8cd62d9622) +++ b/js/build-server/parseScreenNames.js (date 1670427587567) @@ -31,9 +31,6 @@ } } ); - if ( result instanceof Error ) { - throw result; - } assert( result !== null, 'must be a list of screen names, not null' ); returnObject[ locale ] = result; Index: js/common/puppeteerLoad.js IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/js/common/puppeteerLoad.js b/js/common/puppeteerLoad.js --- a/js/common/puppeteerLoad.js (revision 162ba71d91d4146d0844ef56cb629d8cd62d9622) +++ b/js/common/puppeteerLoad.js (date 1670428131396) @@ -48,49 +48,50 @@ page = await browser.newPage(); await page.setDefaultNavigationTimeout( options.puppeteerTimeout ); - let resolve; - let loaded = false; - const promise = new Promise( res => { - resolve = res; - } ); - - page.on( 'load', async () => { - loaded = true; - await sleep( options.waitAfterLoad ); - if ( options.waitForFunction ) { - await page.waitForFunction( options.waitForFunction, { - polling: 100, // default is every animation frame - timeout: options.puppeteerTimeout - } ); - } - resolve( options.evaluate && !page.isClosed() ? await page.evaluate( options.evaluate ) : null ); - } ); - page.on( 'error', message => { - winston.info( `puppeteer error: ${message}` ); - resolve( new Error( message ) ); - } ); - page.on( 'pageerror', message => { - if ( options.resolvePageErrors ) { - winston.info( `puppeteer pageerror: ${message}` ); - resolve( new Error( message ) ); - } - } ); - ( async () => { - await sleep( options.allowedTimeToLoad ); - if ( !loaded ) { - winston.info( 'puppeteer not loaded' ); - resolve( new Error( `Did not load in ${options.allowedTimeToLoad}` ) ); - } - } )(); - - await page.goto( url, { - timeout: options.puppeteerTimeout - } ); - const result = await promise; - !page.isClosed() && await page.close(); + let result; + try { + result = await new Promise( ( resolve, reject ) => { + let loaded = false; + page.on( 'load', async () => { + loaded = true; + await sleep( options.waitAfterLoad ); + if ( options.waitForFunction ) { + await page.waitForFunction( options.waitForFunction, { + polling: 100, // default is every animation frame + timeout: options.puppeteerTimeout + } ); + } + resolve( options.evaluate && !page.isClosed() ? await page.evaluate( options.evaluate ) : null ); + } ); + page.on( 'error', message => { + winston.info( `puppeteer error: ${message}` ); + reject( new Error( message ) ); + } ); + page.on( 'pageerror', message => { + if ( options.resolvePageErrors ) { + winston.info( `puppeteer pageerror: ${message}` ); + reject( new Error( message ) ); + } + } ); + ( async () => { + await sleep( options.allowedTimeToLoad ); + if ( !loaded ) { + winston.info( 'puppeteer not loaded' ); + reject( new Error( `Did not load in ${options.allowedTimeToLoad}` ) ); + } + } )(); + } ); + } + finally { + await page.goto( url, { + timeout: options.puppeteerTimeout + } ); + !page.isClosed() && await page.close(); - // If we created a temporary browser, close it - ownsBrowser && await browser.close(); + // If we created a temporary browser, close it + ownsBrowser && await browser.close(); + } + return result; } ```

I think this is essentially equivalent to what you wrote, but IMHO it uses async and promise patterns more semantically. Let me know what you think. I don't know what other uses would be affected by the change I suggested in puppeteerLoad, some additional refactoring may be needed and if it is working as-is then we can probably defer this.

As far as putting it in practice, I think we can just coordinate a time to pull master for perennial and restart the build-server on phet-server2, then do a deploy of chains to make sure it isn't broken. Then we'll just have to wait and see if the errors disappear or reduce in frequency.

mattpen commented 1 year ago

~Wait - I think I see an error I introduced. If reject is called then we won't close the page or browser. I'll update the patch momentarily.~

Patch updated above.

zepumph commented 1 year ago

I hear that your patch is more idiomatic. The file was written by @jonathanolson with errors resolving, and there have been a couple other cases across the project where we have resolved errors so it didn't really both me. I think I would most prefer to keep it the way it is if that is alright.

Are you available this afternoon for an integration pair?

zepumph commented 1 year ago

@mattpen convinced me that we should refactor puppeteerLoad to reject errors, see https://github.com/phetsims/perennial/issues/291.

We updated the build server with the new version which uses puppeteerLoad. We both agree that this is not a fix, but that it should show the error much more specifically if it happens again.

We either think that it is a race condition from when we copy the all HTML to the NFS and then immediately try to access it through phet.colorado.edu, or a problem with resource limitations.

We'll come back the next time the error occurs.