phetsims / aqua

Automatic QUality Assurance
MIT License
2 stars 4 forks source link

CTQ error opening puppeteer #144

Closed zepumph closed 2 years ago

zepumph commented 2 years ago

This begain at 2am this morning. I cannot reproduce locally with grunt quick-server, and a restart of the CTQ didn't fix it.


Error: Failed to launch the browser process!
[0523/020815.497156:FATAL:[zygote_host_impl_linux.cc](http://zygote_host_impl_linux.cc/)(117)] No usable sandbox! Update your kernel or see https://chromium.googlesource.com/chromium/src/+/main/docs/linux/suid_sandbox_development.md for more information on developing with the SUID sandbox. If you want to live dangerously and need an immediate workaround, you can try using --no-sandbox.
#0 0x56103170f469 base::debug::CollectStackTrace()
#1 0x561031675f33 base::debug::StackTrace::StackTrace()
#2 0x561031688d50 logging::LogMessage::~LogMessage()
#3 0x56102f7495db content::ZygoteHostImpl::Init()
#4 0x561031222df2 content::ContentMainRunnerImpl::Initialize()
#5 0x561031220eb9 content::RunContentProcess()
#6 0x56103122100e content::ContentMain()
#7 0x56103127bfca headless::(anonymous namespace)::RunContentMain()
#8 0x56103127bcd5 headless::HeadlessShellMain()
#9 0x56102de6d668 ChromeMain
#10 0x7f250f4b1555 __libc_start_main
#11 0x56102de6d4aa _start

Received signal 6
#0 0x56103170f469 base::debug::CollectStackTrace()
#1 0x561031675f33 base::debug::StackTrace::StackTrace()
#2 0x56103170ef71 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7f2513c2e630 (/usr/lib64/libpthread-2.17.so+0xf62f)
#4 0x7f250f4c5387 __GI_raise
#5 0x7f250f4c6a78 __GI_abort
#6 0x56103170e1f5 base::debug::BreakDebuggerAsyncSafe()
#7 0x561031689221 logging::LogMessage::~LogMessage()
#8 0x56102f7495db content::ZygoteHostImpl::Init()
#9 0x561031222df2 content::ContentMainRunnerImpl::Initialize()
#10 0x561031220eb9 content::RunContentProcess()
#11 0x56103122100e content::ContentMain()
#12 0x56103127bfca headless::(anonymous namespace)::RunContentMain()
#13 0x56103127bcd5 headless::HeadlessShellMain()
#14 0x56102de6d668 ChromeMain
#15 0x7f250f4b1555 __libc_start_main
#16 0x56102de6d4aa _start
  r8: 0000000000000000  r9: 0000000000000300 r10: 0000000000000008 r11: 0000000000000202
 r12: 00007fff247eb270 r13: 00007fff247eb288 r14: 00007fff247eb280 r15: 00002c3e00221400
  di: 000000000001f0d6  si: 000000000001f0d6  bp: 00007fff247ea1f0  bx: 00007fff247eaa30
  dx: 0000000000000006  ax: 0000000000000000  cx: ffffffffffffffff  sp: 00007fff247ea0b8
  ip: 00007f250f4c5387 efl: 0000000000000202 cgf: aaaa000000000033 erf: 0000000000000000
 trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]

TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md

Error: Failed to launch the browser process!
[0523/093806.639360:FATAL:[zygote_host_impl_linux.cc](http://zygote_host_impl_linux.cc/)(174)] Check failed: process.IsValid(). Failed to launch zygote process
#0 0x55d3e02f7469 base::debug::CollectStackTrace()
#1 0x55d3e025df33 base::debug::StackTrace::StackTrace()
#2 0x55d3e0270d50 logging::LogMessage::~LogMessage()
#3 0x55d3e027190e logging::LogMessage::~LogMessage()
#4 0x55d3de331b08 content::ZygoteHostImpl::LaunchZygote()
#5 0x55d3dfe0ba30 content::(anonymous namespace)::LaunchZygoteHelper()
#6 0x55d3dd9906cb content::ZygoteCommunication::Init()
#7 0x55d3dd990cd4 content::CreateGenericZygote()
#8 0x55d3dfe0ae95 content::ContentMainRunnerImpl::Initialize()
#9 0x55d3dfe08eb9 content::RunContentProcess()
#10 0x55d3dfe0900e content::ContentMain()
#11 0x55d3dfe63fca headless::(anonymous namespace)::RunContentMain()
#12 0x55d3dfe63cd5 headless::HeadlessShellMain()
#13 0x55d3dca55668 ChromeMain
#14 0x7f4ea9bc8555 __libc_start_main
#15 0x55d3dca554aa _start
Received signal 6
#0 0x55d3e02f7469 base::debug::CollectStackTrace()
#1 0x55d3e025df33 base::debug::StackTrace::StackTrace()
#2 0x55d3e02f6f71 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7f4eae345630 (/usr/lib64/libpthread-2.17.so+0xf62f)

#4 0x7f4ea9bdc387 __GI_raise
#5 0x7f4ea9bdda78 __GI_abort
#6 0x55d3e02f61f5 base::debug::BreakDebuggerAsyncSafe()
#7 0x55d3e0271221 logging::LogMessage::~LogMessage()
#8 0x55d3e027190e logging::LogMessage::~LogMessage()
#9 0x55d3de331b08 content::ZygoteHostImpl::LaunchZygote()
#10 0x55d3dfe0ba30 content::(anonymous namespace)::LaunchZygoteHelper()
#11 0x55d3dd9906cb content::ZygoteCommunication::Init()
#12 0x55d3dd990cd4 content::CreateGenericZygote()
#13 0x55d3dfe0ae95 content::ContentMainRunnerImpl::Initialize()
#14 0x55d3dfe08eb9 content::RunContentProcess()
#15 0x55d3dfe0900e content::ContentMain()
#16 0x55d3dfe63fca headless::(anonymous namespace)::RunContentMain()
#17 0x55d3dfe63cd5 headless::HeadlessShellMain()
#18 0x55d3dca55668 ChromeMain
#19 0x7f4ea9bc8555 __libc_start_main
#20 0x55d3dca554aa _start
  r8: 0000000000000000  r9: 0000000000000300 r10: 0000000000000008 r11: 0000000000000206
 r12: 000038ae00314140 r13: 000038ae00314158 r14: 000038ae00314150 r15: 000038ae00221400
  di: 000000000002cbd7  si: 000000000002cbd7  bp: 00007ffd11497200  bx: 00007ffd11497a40
  dx: 0000000000000006  ax: 0000000000000000  cx: ffffffffffffffff  sp: 00007ffd114970c8
  ip: 00007f4ea9bdc387 efl: 0000000000000206 cgf: aaaa000000000033 erf: 0000000000000000
 trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
zepumph commented 2 years ago

I can reproduce this on bayes with another script that uses perennial's puppeteerLoad:

( async () => {
  const puppeteerLoad = require( process.cwd() + '/../perennial/js/common/puppeteerLoad.js' );
  const error = await puppeteerLoad( 'https://www.google.com', {
    waitAfterLoad: 0
  } );
  console.log( error );
} )();
``` [phet-admin@bayes aqua]$ node ~/puppeteerTemp.js (node:52841) UnhandledPromiseRejectionWarning: Error: Failed to launch the browser process! [0523/101722.377058:FATAL:zygote_host_impl_linux.cc(174)] Check failed: process.IsValid(). Failed to launch zygote process #0 0x560f2a46f469 base::debug::CollectStackTrace() #1 0x560f2a3d5f33 base::debug::StackTrace::StackTrace() #2 0x560f2a3e8d50 logging::LogMessage::~LogMessage() #3 0x560f2a3e990e logging::LogMessage::~LogMessage() #4 0x560f284a9b08 content::ZygoteHostImpl::LaunchZygote() #5 0x560f29f83a30 content::(anonymous namespace)::LaunchZygoteHelper() #6 0x560f27b086cb content::ZygoteCommunication::Init() #7 0x560f27b08cd4 content::CreateGenericZygote() #8 0x560f29f82e95 content::ContentMainRunnerImpl::Initialize() #9 0x560f29f80eb9 content::RunContentProcess() #10 0x560f29f8100e content::ContentMain() #11 0x560f29fdbfca headless::(anonymous namespace)::RunContentMain() #12 0x560f29fdbcd5 headless::HeadlessShellMain() #13 0x560f26bcd668 ChromeMain #14 0x7fa02e5da555 __libc_start_main #15 0x560f26bcd4aa _start Received signal 6 #0 0x560f2a46f469 base::debug::CollectStackTrace() #1 0x560f2a3d5f33 base::debug::StackTrace::StackTrace() #2 0x560f2a46ef71 base::debug::(anonymous namespace)::StackDumpSignalHandler() #3 0x7fa032d57630 (/usr/lib64/libpthread-2.17.so+0xf62f) #4 0x7fa02e5ee387 __GI_raise #5 0x7fa02e5efa78 __GI_abort #6 0x560f2a46e1f5 base::debug::BreakDebuggerAsyncSafe() #7 0x560f2a3e9221 logging::LogMessage::~LogMessage() #8 0x560f2a3e990e logging::LogMessage::~LogMessage() #9 0x560f284a9b08 content::ZygoteHostImpl::LaunchZygote() #10 0x560f29f83a30 content::(anonymous namespace)::LaunchZygoteHelper() #11 0x560f27b086cb content::ZygoteCommunication::Init() #12 0x560f27b08cd4 content::CreateGenericZygote() #13 0x560f29f82e95 content::ContentMainRunnerImpl::Initialize() #14 0x560f29f80eb9 content::RunContentProcess() #15 0x560f29f8100e content::ContentMain() #16 0x560f29fdbfca headless::(anonymous namespace)::RunContentMain() #17 0x560f29fdbcd5 headless::HeadlessShellMain() #18 0x560f26bcd668 ChromeMain #19 0x7fa02e5da555 __libc_start_main #20 0x560f26bcd4aa _start r8: 0000000000000000 r9: 0000000000000300 r10: 0000000000000008 r11: 0000000000000202 r12: 00001b2600314140 r13: 00001b2600314158 r14: 00001b2600314150 r15: 00001b2600221400 di: 000000000000ce76 si: 000000000000ce76 bp: 00007ffec79adcd0 bx: 00007ffec79ae510 dx: 0000000000000006 ax: 0000000000000000 cx: ffffffffffffffff sp: 00007ffec79adb98 ip: 00007fa02e5ee387 efl: 0000000000000202 cgf: aaaa000000000033 erf: 0000000000000000 trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000 [end of stack trace] TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md at onClose (/data/share/phet/continuous-quick-server/perennial/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:241:20) at ChildProcess. (/data/share/phet/continuous-quick-server/perennial/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:232:79) at ChildProcess.emit (events.js:327:22) at Process.ChildProcess._handle.onexit (internal/child_process.js:277:12) (Use `node --trace-warnings ...` to show where the warning was created) (node:52841) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1) (node:52841) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code. [phet-admin@bayes aqua]$
zepumph commented 2 years ago

I was able to successfully launch Puppeteer on Bayes by following this guide to turn off the sandbox.

https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#setting-up-chrome-linux-sandbox

This is NOT recommended though for security reasons, even though it may be an acceptable workaround if we decide our content is safe enough, we may still want to investigate the 'recommended' solution, which is to configure a sandbox for chrome to run on bayes. I think from here I need to talk to @jonathanolson about some options.

It also still isn't clear to me why this came up only last night. I don't see a new version of chrome that came out, and the version of puppeteer I updated has stayed locked in for the last few weeks at least (confirmed with npm list in aqua/ on bayes).

zepumph commented 2 years ago

Working on this with @mattpen and @jonathanolson. We are leaning towards thinking this is a docker issue. Here is an error that most of our headless chrome instances for CT have:

``` [0100/000000.249433:ERROR:broker_posix.cc(46)] Received unexpected number of handles [0523/153535.292792:ERROR:process_memory_range.cc(86)] read out of range [0523/153535.292901:ERROR:elf_image_reader.cc(606)] missing nul-terminator [0523/154002.015112:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/a11e5c99-186c-4d89-a289-a8ce9a2a51f0.lock: File exists (17) [0523/154002.015203:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/a11e5c99-186c-4d89-a289-a8ce9a2a51f0.lock: File exists (17) [0523/154002.036121:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/9530dea5-25fa-48b6-b72c-aebed52a2d1e.lock: File exists (17) [0523/154002.039321:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/623ada45-3174-41ea-b4a2-583e83e8d9ef.lock: File exists (17) [0523/154002.051103:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/f7416602-e168-4660-84a9-fd65a4b96ecb.lock: File exists (17) [0523/154002.051200:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/f7416602-e168-4660-84a9-fd65a4b96ecb.lock: File exists (17) [0523/154002.051262:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/16840dbc-848a-446f-8148-52637b4d07d5.lock: File exists (17) [0523/154002.071198:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/7597dcec-e299-424c-ac1a-d1b40d89f84c.lock: File exists (17) [0523/154002.072345:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/85df6892-07a9-4107-adff-0c647e71b107.lock: File exists (17) [0523/154002.072611:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/85df6892-07a9-4107-adff-0c647e71b107.lock: File exists (17) [0523/154002.078129:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/f7ddb4f2-02af-4663-9991-e048e0ea50bb.lock: File exists (17) [0523/154002.087450:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/f935e0c7-50bf-4b63-b601-4c35e7e17645.lock: File exists (17) [0523/154002.099497:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/a1525ae3-b87b-414f-adc5-63aac2484cb9.lock: File exists (17) [0523/154002.102084:ERROR:filesystem_posix.cc(32)] lstat /tmp/Crashpad/completed/1ed8cf80-b097-4b46-ae02-76a2f234927a.lock: No such file or directory (2) [0523/154002.102109:ERROR:file_io_posix.cc(207)] open /tmp/Crashpad/completed/1ed8cf80-b097-4b46-ae02-76a2f234927a.lock: No such file or directory (2) [0523/154002.103803:ERROR:filesystem_posix.cc(32)] lstat /tmp/Crashpad/completed/958c7fa0-bb9b-4f3c-af82-163e62a546a3.lock: No such file or directory (2) [0523/154002.103826:ERROR:file_io_posix.cc(207)] open /tmp/Crashpad/completed/958c7fa0-bb9b-4f3c-af82-163e62a546a3.lock: No such file or directory (2) [0523/154002.103844:ERROR:filesystem_posix.cc(32)] lstat /tmp/Crashpad/completed/2be4edd2-321b-4254-a27e-2b9da2f2b4d4.lock: No such file or directory (2) [0523/154002.103857:ERROR:file_io_posix.cc(207)] open /tmp/Crashpad/completed/2be4edd2-321b-4254-a27e-2b9da2f2b4d4.lock: No such file or directory (2) [0523/154002.108835:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/f6ca70af-f3ef-4538-93c7-1ec8a77221d3.lock: File exists (17) [0523/154002.135982:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/0c3ed270-677f-47d4-8006-c95e74365de6.lock: File exists (17) [0523/154002.144403:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/fd3a7d2c-2b55-4e62-9a04-00930f733131.lock: File exists (17) [0523/154002.175353:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/9ceb3683-0da6-48d0-950d-0a212489ca52.lock: File exists (17) [0523/154002.176986:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/54a0f9e4-27c3-4d22-aeb5-38d57a0781c8.lock: File exists (17) [0523/154002.178357:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/ac90a634-d199-4647-9cbe-d773d37dfa35.lock: File exists (17) [0523/154002.178600:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/598c9957-f74d-4270-9aeb-dc1a0b52cba4.lock: File exists (17) [0523/154002.180352:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/7597dcec-e299-424c-ac1a-d1b40d89f84c.lock: File exists (17) [0523/154002.183684:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/3b3dbbb4-76fb-4022-9394-12eb59a77ede.lock: File exists (17) [0523/154002.184098:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/f7ddb4f2-02af-4663-9991-e048e0ea50bb.lock: File exists (17) [0523/154002.184419:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/71409b02-6948-4411-aadb-c13ad1ef2cb0.lock: File exists (17) [0523/154002.184673:ERROR:file_io_posix.cc(152)] open /tmp/Crashpad/completed/64419f26-2ba4-4971-9ad0-7b5e947d67c6.lock: File exists (17)
zepumph commented 2 years ago

We decided to try a reboot on Bayes (hadn't happened for 1500 days), and it hasn't yet come back up. @mattpen is taking the lead on reaching out to OIT about this. We will continue our investigation when it is complete.

mattpen commented 2 years ago

@zepumph bayes came back online after reseating a failing memory module. We should be able to continue investigation today. I'm optimistic that the fix will also resolve the puppeteer issues.

zepumph commented 2 years ago

We have not seen this issue for the last week. I think the reboot fixed this. Closing