puppeteer / puppeteer

JavaScript API for Chrome and Firefox
https://pptr.dev
Apache License 2.0
88.44k stars 9.07k forks source link

Puppeteer on docker crashes when loading pages in parallel #1345

Closed nojvek closed 6 years ago

nojvek commented 6 years ago
Error: Failed to launch chrome!
/home/pptr/node_modules/puppeteer/.local-chromium/linux-513435/chrome-linux/chrome: 
error while loading shared libraries: libgconf-2.so.4: 
cannot open shared object file: No such file or directory

would love a base docker image certified by google that works everytime.

kunalrjain commented 6 years ago

Even I am facing same issue.

Logs:

2017-11-10T15:26:18.18+0530 [App/0] ERR Potentially unhandled rejection [1] Error: Failed to launch chrome! 2017-11-10T15:26:18.18+0530 [App/0] ERR /home/vcap/app/node_modules/puppeteer/.local-chromium/linux-508693/chrome-linux/chrome: error while loading shared libraries: libX11-xcb.so.1: cannot open shared object file: No such file or directory

I am able to run this app locally since all the puppeteer dependencies and libraries are available. However in cloud environment these libraries are not available. I understand these are stack level libraries and hence not included as part of package.

Can you provide some solution since we are not able to add these libraries manually ?

Garbee commented 6 years ago

Have you looked at the troubleshooting document? The error clearly shows missing libraries which means you are missing dependencies. The document has a list for Debian and redhat distributions to get going with.

nojvek commented 6 years ago

Also tried to use this as my dockerfile

# node:carbon is 8.9 LTS release
FROM node:carbon

# From: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md
# Install latest chrome dev package which installs the necessary libs
# to make the bundled version of Chromium that Pupppeteer installs, work.
RUN apt-get update && apt-get install -y wget --no-install-recommends \
  && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
  && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
  && apt-get update \
  && apt-get install -y google-chrome-unstable --no-install-recommends \
  && rm -rf /var/lib/apt/lists/* \
  && rm -rf /src/*.deb

# Add pptr user
RUN useradd --create-home --user-group --shell /bin/bash pptr
USER pptr
WORKDIR /home/pptr

# Install npm deps first since we want to keep modules cached for super fast builds
COPY package.json .
RUN yarn install

COPY puppeteer_tests.js .

# run the tests by default
CMD [ "node", "puppeteer_tests.js"]

Inside the test I open a number of files in parallel, but it seems to crash the chromium version that installs when puppeteer installs. I get a feeling one of the libs installed by google-chrome-unstable doesn't play nicely with yarn installed puppeteer chromium version.

The pages crash when opening multiple tabs. docker image has 8G of memory and all my host cpu resources.

H91011 commented 6 years ago

you can skip chromium install with add this to package.json "scripts": { "build-dev": "PUPPETEER_SKIP_CHROMIUM_DOWNLOAD" }

kunalrjain commented 6 years ago

Hi, I had to write my own custom build pack to resolve this issue. You can refer to Heroku build pack and write your own on similar lines.

https://github.com/heroku/heroku-buildpack-google-chrome

zhuyingda commented 6 years ago

@Garbee And centos dependency also in troubleshooting document, which added from my issue

joelgriffith commented 6 years ago

If you want to use the provided version bundled with puppeteer (which I recommend after trying to download it via various package managers), then you'll need a list like so:

# Dependencies needed for packages downstream
RUN apt-get update && apt-get install -y \
  wget \
  unzip \
  fontconfig \
  locales \
  gconf-service \
  libasound2 \
  libatk1.0-0 \
  libc6 \
  libcairo2 \
  libcups2 \
  libdbus-1-3 \
  libexpat1 \
  libfontconfig1 \
  libgcc1 \
  libgconf-2-4 \
  libgdk-pixbuf2.0-0 \
  libglib2.0-0 \
  libgtk-3-0 \
  libnspr4 \
  libpango-1.0-0 \
  libpangocairo-1.0-0 \
  libstdc++6 \
  libx11-6 \
  libx11-xcb1 \
  libxcb1 \
  libxcomposite1 \
  libxcursor1 \
  libxdamage1 \
  libxext6 \
  libxfixes3 \
  libxi6 \
  libxrandr2 \
  libxrender1 \
  libxss1 \
  libxtst6 \
  ca-certificates \
  fonts-liberation \
  libappindicator1 \
  libnss3 \
  lsb-release \
  xdg-utils \
  wget
nojvek commented 6 years ago

Thanks for this. Just curious. How are you getting this list?

I'm just concerned, when puppeteer bundled chromium updates, how do I get the deps list for newer versions in dockerfile

joelgriffith commented 6 years ago

I believe it was originally curated at https://github.com/ebidel/try-puppeteer, though I'm having issues locating it now. (https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md is also helpful).

@nojvek I handle this by keeping older tags of puppeteer-specific builds around. You'll likely have to slog through updates until it works and then launch the new version of the docker image + any dependent code. Again, this is the strategy I've adopted for my project.

nojvek commented 6 years ago

@joelgriffith even with the list you've given me it seems the bundled puppeteer crashes when opening tabs in parallel.

10:54:35 Error: Protocol error (Target.activateTarget): Session closed. Most likely the page has been closed.
    at Session.send (/home/pptr/node_modules/puppeteer/lib/Connection.js:167:29)
    at Page._screenshotTask (/home/pptr/node_modules/puppeteer/lib/Page.js:651:24)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)
nojvek commented 6 years ago

It seems puppeteer on docker is very unstable.

Here's my Dockerfile.

# node:carbon is 8.9 LTS release
FROM node:carbon

# From: https://github.com/GoogleChrome/puppeteer/issues/1345#issuecomment-343554457
# Install necessary apt packages for puppeteer bundled chromium work
RUN apt-get update && apt-get install --no-install-recommends -y \
  ca-certificates \
  fontconfig \
  fonts-liberation \
  gconf-service \
  libappindicator1 \
  libasound2 \
  libatk1.0-0 \
  libc6 \
  libcairo2 \
  libcups2 \
  libdbus-1-3 \
  libexpat1 \
  libfontconfig1 \
  libgcc1 \
  libgconf-2-4 \
  libgdk-pixbuf2.0-0 \
  libglib2.0-0 \
  libgtk-3-0 \
  libnspr4 \
  libnss3 \
  libpango-1.0-0 \
  libpangocairo-1.0-0 \
  libstdc++6 \
  lib\x11-6 \
  libx11-xcb1 \
  libxcb1 \
  libxcomposite1 \
  libxcursor1 \
  libxdamage1 \
  libxext6 \
  libxfixes3 \
  libxi6 \
  libxrandr2 \
  libxrender1 \
  libxss1 \
  libxtst6 \
  locales \
  lsb-release \
  unzip \
  wget \
  xdg-utils \
  && rm -rf /var/lib/apt/lists/* \
  && rm -rf /src/*.deb

# Install dumb-init
RUN wget https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64.deb\
  && dpkg -i dumb-init_*.deb \
  && rm dumb-init_*.deb

# Add pptr user
RUN useradd --create-home --user-group --shell /bin/bash pptr
USER pptr
WORKDIR /home/pptr

# Install npm deps first since we want to keep modules cached for super fast builds
COPY package.json .
RUN yarn install

COPY tests.js .

# run the tests by default
CMD [ "dumb-init", "node", "tests.js" ]

Opening puppeteer like this

    browser = await puppeteer.launch({
      headless: true,
      args: [`--no-sandbox`, `--disable-setuid-sandbox`],
    });

Here's the error I get. It happens every single time.

docker stats show that it's only consuming about 20MB of memory. The machine has 16GB of memory. NET I/O and BLOCK I/O both are very negligible.

Unhandled promise rejection (rejection id: 1): Error: Page crashed!
(node:7) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Error: Protocol error (Page.captureScreenshot): Target closed.
    at Session._onClosed (/home/pptr/node_modules/puppeteer/lib/Connection.js:210:23)
    at Connection._onClose (/home/pptr/node_modules/puppeteer/lib/Connection.js:116:15)
    at emitTwo (events.js:126:13)
    at WebSocket.emit (events.js:214:7)
    at WebSocket.emitClose (/home/pptr/node_modules/ws/lib/WebSocket.js:213:10)
    at _receiver.cleanup (/home/pptr/node_modules/ws/lib/WebSocket.js:195:41)
    at Receiver.cleanup (/home/pptr/node_modules/ws/lib/Receiver.js:520:15)
    at WebSocket.finalize (/home/pptr/node_modules/ws/lib/WebSocket.js:195:22)
    at emitNone (events.js:111:20)
    at Socket.emit (events.js:208:7)
joelgriffith commented 6 years ago

In the images used by https://browserless.io, we don't base off of FROM node:carbon as Chrome often requires more packages to be available, and if you want certain fonts to work you'll need a richer base.

You might give FROM ubuntu:16.04 a try, though it will result in a larger image. I've definitely gotten multiple Targets working, I'll dig through my commit logs and see if anything else obvious comes up

nojvek commented 6 years ago

That makes sense. I also hypothesize that it could be due to sandbox issues. Multiple pages not being able to isolate themselves properly.

I'll deffo give Ubuntu base a shot.

joelgriffith commented 6 years ago

Might also try increasing the /dev/shm size as well with --shm-size 1gb or whatever is sensible

nojvek commented 6 years ago

/dev/shm was key.

DockerFile

# ubuntu:xenial is the 16.04 LTS release
FROM ubuntu:xenial

# From: https://github.com/GoogleChrome/puppeteer/issues/1345#issuecomment-343554457
# Install necessary apt packages for puppeteer bundled chromium work
RUN apt-get update && apt-get install --no-install-recommends -y \
  ca-certificates \
  curl \
  fontconfig \
  fonts-liberation \
  gconf-service \
  git \
  libappindicator1 \
  libasound2 \
  libatk1.0-0 \
  libc6 \
  libcairo2 \
  libcups2 \
  libdbus-1-3 \
  libexpat1 \
  libfontconfig1 \
  libgcc1 \
  libgconf-2-4 \
  libgdk-pixbuf2.0-0 \
  libglib2.0-0 \
  libgtk-3-0 \
  libnspr4 \
  libnss3 \
  libpango-1.0-0 \
  libpangocairo-1.0-0 \
  libstdc++6 \
  lib\x11-6 \
  libx11-xcb1 \
  libxcb1 \
  libxcomposite1 \
  libxcursor1 \
  libxdamage1 \
  libxext6 \
  libxfixes3 \
  libxi6 \
  libxrandr2 \
  libxrender1 \
  libxss1 \
  libxtst6 \
  locales \
  lsb-release \
  unzip \
  wget \
  xdg-utils

# Install dumb-init. Node has issues being pid 1
RUN wget https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64.deb\
  && dpkg -i dumb-init_*.deb \
  && rm dumb-init_*.deb

# Install nodejs 8
RUN curl -sL https://deb.nodesource.com/setup_8.x | bash - \
  && apt-get install -y nodejs

# Add pptr user
RUN useradd --create-home --user-group --shell /bin/bash pptr
USER pptr
WORKDIR /home/pptr

# Install npm deps first since we want to keep modules cached for super fast builds
COPY package.json .
RUN npm install

COPY tests.js .

# run the tests by default
CMD [ "dumb-init", "node", "tests.js"]

Running as

docker run -it --name tests --shm-size 1gb tests Works wonders.

I wonder if /dev/shm = 1gb can be baked into the dockerfile rather than being passed as param. Would look cleaner.

Also @joelgriffith / Chrome puppeteer core contributors. Would you mind if I send a Dockerfile as a PR? And would you consider it publishing it to official docker registry as GoogleChrome/puppeteer base image ?

Having a base image that always works without crazy setup would be super helpful to rest of the community.

ebidel commented 6 years ago

I think this docker file example is enough: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md#running-puppeteer-in-docker

On Thu, Nov 16, 2017, 2:24 AM Manoj Patel notifications@github.com wrote:

/dev/shm was key.

DockerFile

ubuntu:xenial is the 16.04 LTS release

FROM ubuntu:xenial

From: https://github.com/GoogleChrome/puppeteer/issues/1345#issuecomment-343554457

Install necessary apt packages for puppeteer bundled chromium work

RUN apt-get update && apt-get install --no-install-recommends -y \ ca-certificates \ curl \ fontconfig \ fonts-liberation \ gconf-service \ git \ libappindicator1 \ libasound2 \ libatk1.0-0 \ libc6 \ libcairo2 \ libcups2 \ libdbus-1-3 \ libexpat1 \ libfontconfig1 \ libgcc1 \ libgconf-2-4 \ libgdk-pixbuf2.0-0 \ libglib2.0-0 \ libgtk-3-0 \ libnspr4 \ libnss3 \ libpango-1.0-0 \ libpangocairo-1.0-0 \ libstdc++6 \ lib\x11-6 \ libx11-xcb1 \ libxcb1 \ libxcomposite1 \ libxcursor1 \ libxdamage1 \ libxext6 \ libxfixes3 \ libxi6 \ libxrandr2 \ libxrender1 \ libxss1 \ libxtst6 \ locales \ lsb-release \ unzip \ wget \ xdg-utils

Install dumb-init. Node has issues being pid 1

RUN wget https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64.deb\ && dpkg -i dumb-init*.deb \ && rm dumb-init*.deb

Install nodejs 8

RUN curl -sL https://deb.nodesource.com/setup_8.x | bash - \ && apt-get install -y nodejs

Add pptr user

RUN useradd --create-home --user-group --shell /bin/bash pptr USER pptr WORKDIR /home/pptr

Install npm deps first since we want to keep modules cached for super fast builds

COPY package.json . RUN npm install

COPY tests.js .

run the tests by default

CMD [ "dumb-init", "node", "tests.js"]

Running as

docker run -it --name tests --shm-size 1gb tests Works wonders.

I wonder if /dev/shm = 1gb can be baked into the dockerfile rather than being passed as param. Would look cleaner.

Also @joelgriffith https://github.com/joelgriffith / Chrome puppeteer core contributors. Would you mind if I send a Dockerfile as a PR? And would be happy to publish it in docker registry as GoogleChrome/puppeteer base image ?

Would be super helpful to rest of the community.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GoogleChrome/puppeteer/issues/1345#issuecomment-344880003, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOigITDwkMl2bvnIjrfQL3M0j-u3_ATks5s3A1PgaJpZM4QZKyA .

nojvek commented 6 years ago

Lol! That’s the reason why I filed the issue. The dockerfile link in troubleshooting guide was crashing on multiple tabs.

It does make sense to maintain an official Docker image than we can FROM:x and build upon, wouldn’t you agree. It took me almost a week to get a working Docker image that doesn’t crash.

joelgriffith commented 6 years ago

I've been debating whether or not to open-source the images I run at browserless, as it does take days/weeks to get right. Browserless goes a step further and treats the browser more like an appliance (think database) then just a binary to execute arbitrarily. There's also other fixes included like emoji's and other fonts. Finally, I maintain builds for specific puppeteer versions + a little live debugger that makes it really painless to test scripts against your headless farm. Would that be of interest?

ebidel commented 6 years ago

@nojvek so perhaps dumb-init is also a key piece. Others have had success with it. Other than that, I don't see a huge difference with your setup...other than a different base image.

I think one of the problems with blessing + hosting an official docker image is that folks want many different things: base images, ways to run the container, etc. That's just based on observation of all the flavors of Docker issues we see. But I could be wrong.

nojvek commented 6 years ago

@joelgriffith browserless.io wouldn’t make sense for us since we test a number of hosts behind firewall. But I really really appreciate your help and guidance here. I’d have almost given up without your help.

@ebidel, I absolutely love what you’ve done with puppeteer but I hope you realize how much of a pain it is to get it consistently working across different machines without it crashing every now and then. Docker solves exactly that problem. Bundling all dependencies in an image so it’s as easy as Docker run.

I believe the puppeteer project managers should definitely engage with the community regarding this.

I’m tempted to add my own image to Docker registry but I believe having a GoogleChrome/puppeteer base image would add less fragmentation when someone searches for an image to use.

Also when google starts supporting using Docker images as cloud functions, this becomes really really powerful and easy to use.

joelgriffith commented 6 years ago

Also when google starts supporting using Docker images as cloud functions, this becomes really really powerful and easy to use.

This might happen but I wouldn't bank on it. Cloud functions generally have a slow startup/warming phase (In chromeless this is ~2 seconds). Since the web-browser isn't entirely stateless it's also tough to make it work well in a lambda context, especially since you interact with it over web sockets regardless of the library. There's also a ton of issues in other repos solely regarding this.

I really feel that it's more like an external database/appliance where it should be available remotely, and you just connect to it and drive it. That's the primary methodology behind browserless and why I'm considering open-sourcing it + the docker image. It doesn't make any prescriptions about what library you use or force you to bundle up any external code with it, it just exposes the remote protocol and adds some other features around it (like concurrency limitations, queuing and so forth).

I’m tempted to add my own image to Docker registry but I believe having a GoogleChrome/puppeteer base image would add less fragmentation when someone searches for an image to use.

I think someone is going to have to do it, but it's not really the motivation behind this repo, which is to expose a nice high-level interface for remotely driving Chrome. Docker isn't the only packaging methodology out there as well, so by having an official repo it kind of opens the doors for supporting others (totally a hypothesis on my end).

Definitely open to thoughts on the above

ebidel commented 6 years ago

@nojvek I feel the pain! :) I personally learned a lot from getting try-puppeteer.appspot.com up and running. In fact, Page.crash is the number 1 error:

screen shot 2017-11-16 at 11 07 14 am

IMO, there are a lot of thorns and limitations running headless chrome in the cloud. Another example is that using puppeteer.connect() really doesn't work that well. You end up hitting out of memory issues with v8/node by having a long running, heavy-weight node process (the browser). This was my failed attempt on try-puppeteer:

screen shot 2017-11-13 at 1 43 04 pm

^ If anyone has ideas on that, I would be most greatful :) Running the server by increasing v8's mem size (--max_old_space_size=4096) did more harm than good. The containers run out of mem even quicker and get restarted more often.


I can't speak for other platforms, but I'm working with the Google Cloud team to figure out ways we can make these things smoother from our end. I think it's an important sace and chrome/puppeteer should work really well in the cloud.

I do agree with @joelgriffith that it gets pretty tricky for this repo to provide production ready, out of the box solutions for every cloud environment. There are just too many container platforms + N ways to set things up. That's something the community should run with. What the Puppeteer team can/should do is provide a north star for developers wanting to get started with puppeteer in Docker. Also help the headless chrome team discover and address bugs when they come up. The /dev/shm issue being one example. We've tried to do that through examples, docs, and the troubleshooting guide.

@nojvek, out of curiosity, do you haven an example script that shows the failure on our troubleshooting docker file? I was able to run this script on that image without issue:

test.js:

const puppeteer = require('puppeteer');
(async() => {
  const browser = await puppeteer.launch({dumpio: true});
  const NUM_TABS = 100;
  for (let i = 0; i < NUM_TABS; ++i) {
    console.log(`Opening tab ${i}`);
    const page = await browser.newPage();
    await page.goto('https://www.google.com/');
  }
  await browser.close();
})();

Ran with:

docker run --rm -p 8080:8080 \
    --shm-size=1g --cap-add=SYS_ADMIN \
    --name puppeteer-chrome puppeteer-chrome-linux \
    node -e "`cat test.js`"
joelgriffith commented 6 years ago

Out of curiosity, @ebidel do you keep the browser around after requests/sessions are fulfilled? I can see that being a slow leak over time considering that puppeteer launches Chrome as a child process IIRC.

ebidel commented 6 years ago

That's right. I was experimenting with express middleware that launches chrome once and reuses it for subsequent requests. The idea being (hopefully) that it's cheaper to not launch a browser per request.

paambaati commented 6 years ago

@ebidel That's a bad idea.

I've been using chrome-remote-interface to build something very similar to Puppeteer (a public talk about this is here - https://paambaati.github.io/rendering-at-scale/), and use it at scale. I ran into each of these problems (memory leaks, segfaults, runaway Chrome instances, slow downs, etc.) and in the end, decided to run them without containers but in their own dedicated AWS ASG, with Mesos/Marathon running them. Here's what I've learnt —

  1. Set /dev/shm to 1 GB or more.
  2. Never use long-running Chrome - it slowly eats up memory; always launch Chrome per-request. SIGKILL Chrome at the end, and cleanup the data directory later, outside of your Puppeteer or chrome-remote-interface based app.
  3. Use a zombie-reaper like dumb-init to start your main CMD or ENTRYPOINT process.
  4. Increase --max-old-space-size.

While these things helped a lot with the stability issues, ultimately, Chrome inside Docker was still unstable and would either slow down (accept socket connections over the remote debugging port but then not respond to a few commands) or segfault after a few hours.

joelgriffith commented 6 years ago

Here we go: https://github.com/joelgriffith/browserless Docker: https://hub.docker.com/r/browserless/chrome/

nojvek commented 6 years ago

Love it.

paambaati commented 6 years ago

Recently, a slightly relevant bug was fixed in 64.0.3281.0 - see https://bugs.chromium.org/p/chromium/issues/detail?id=736452

Would someone be willing to retest with the new version and see if things improve?

aslushnikov commented 6 years ago

Closing this since @joelgriffith provided a docker image that everyone likes.