xvrh / puppeteer-dart

A Dart library to automate the Chrome browser over the DevTools Protocol. This is a port of the Puppeteer API
BSD 3-Clause "New" or "Revised" License
229 stars 56 forks source link

How to use puppeteer.connect on an instance, that runs on a google cloud function? #205

Closed Wizzel1 closed 1 year ago

Wizzel1 commented 1 year ago

I am trying to connect to a puppeteer instance that runs on a google cloud function v2.

This is the setup inside the function:

import getPort from 'get-port';
import puppeteer from "puppeteer-extra";

const port = await getPort();
const browser = await puppeteer.launch({
    args: [
        `--remote-debugging-port=${port}`,
    ],
});

console.log(browser.wsEndpoint());

this logged an endpoint. I copied that endpoint and tried to connect to it via

await puppeteer.connect(myEndpoint);

but it threw an error:

SocketException: OS Error: The remote computer refused the network connection.
, errno = 1225, address = 127.0.0.1, port = 59250

How can I fix that?

xvrh commented 1 year ago

This is a very interesting idea and I would love to know if it's possible.

The difficulty is not really about puppeteer but rather about how Cloud Function works. How to expose an external port and how to have a long-lived function running?

From the error message it seems chromium bound the port to a local address 127.0.0.1 and thus you cannot connect to it from your client.

Let us know if you make any progress on this project :-)

Wizzel1 commented 1 year ago

⚠️ EDIT: Solved this step, keeping it for reference

@xvrh thanks for taking the time to look at my issue. In the meantime I have decided to create my own server with dart_frog (mainly because I am more experienced in dart).

I am currently stuck on creating an image for the server.

I initially ran into this error message:

ProcessException: No such file or directory
Command: unzip .local-chromium / 901912_chrome-linux.zip -d .local-chromium / 901912

but I found #161 where you posted some code to download chromium to the container.

Unfortunately, I am not able to get this working, because I have never worked with docker before.

Since a custom dockerfile for dart_frog is required in this case, I have copy pasted this dockerfile as starting point.

I have tried to add your recommendation from #161 but apparently, I am doing something wrong because I get the error => ERROR [build 10/10] RUN dart bin/download_chromium.dart

This is my current dockerfile:

FROM debian:buster

# All the dependencies recommanded to install Chromium
RUN  apt-get update \
     && apt-get install -y ...
# An example of using a custom Dockerfile with Dart Frog
# Official Dart image: https://hub.docker.com/_/dart
# Specify the Dart SDK base image version using dart:<version> (ex: dart:2.17)
FROM dart:stable AS build

WORKDIR /app

# Resolve app dependencies.
COPY pubspec.* ./
RUN dart pub get

# Copy app source code and AOT compile it.
COPY . .

# Generate a production build.
RUN dart pub global activate dart_frog_cli
RUN dart pub global run dart_frog_cli:dart_frog build

# Ensure packages are still up-to-date if anything has changed.
RUN dart pub get --offline
RUN dart compile exe build/bin/server.dart -o build/bin/server
RUN dart bin/download_chromium.dart

# Build minimal serving image from AOT-compiled `/server` and required system
# libraries and configuration files stored in `/runtime/` from the build stage.
FROM scratch
COPY --from=build /runtime/ /
COPY --from=build /app/build/bin/server /app/bin/
# Uncomment the following line if you are serving static files.
# COPY --from=build /app/build/public /public/

# Start the server.
CMD ["/app/bin/server"]
Wizzel1 commented 1 year ago

@xvrh my bad, I forgot to create the download_chromium.dart file. I created it in the root directory of my project. (Please tell me if you would advise otherwise)

This is my dockerfile now:

FROM debian:buster
# All the dependencies recommanded to install Chromium
RUN  apt-get update \
     && apt-get install -y ... 

# An example of using a custom Dockerfile with Dart Frog
# Official Dart image: https://hub.docker.com/_/dart
# Specify the Dart SDK base image version using dart:<version> (ex: dart:2.17)
FROM dart:stable AS build

WORKDIR /app

# Resolve app dependencies.
COPY pubspec.* ./
RUN dart pub get

# Copy app source code and AOT compile it.
COPY . .

# Generate a production build.
RUN dart pub global activate dart_frog_cli
RUN dart pub global run dart_frog_cli:dart_frog build

# Ensure packages are still up-to-date if anything has changed.
RUN dart pub get --offline && dart download_chromium.dart
RUN dart compile exe build/bin/server.dart -o build/bin/server

# Build minimal serving image from AOT-compiled `/server` and required system
# libraries and configuration files stored in `/runtime/` from the build stage.
FROM scratch
COPY --from=build /runtime/ /
COPY --from=build /app/build/bin/server /app/bin/
# Uncomment the following line if you are serving static files.
# COPY --from=build /app/build/public /public/

# Start the server.
CMD ["/app/bin/server"]

However when I run this image and try to use puppeteer in it, I am getting an error: FileSystemException: Creation of temporary directory failed, path = '/tmp' (OS Error: No such file or directory, errno = 2)

Do you know if that is related to puppeteer?

xvrh commented 1 year ago

I think you need to re-work a bit the Dockerfile. You can't use a FROM scratch image because Chrome requires more dependencies.

Here is my experiment:

FROM dart:stable

RUN  apt-get update \
     && apt-get install -y wget gnupg ca-certificates procps libxss1 \
     && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
     && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
     && apt-get update \
     # We install Chrome to get all the OS level dependencies, but Chrome itself
     # is not actually used as it's packaged in the node puppeteer library.
     # Alternatively, we could could include the entire dep list ourselves
     # (https://github.com/puppeteer/puppeteer/blob/master/docs/troubleshooting.md#chrome-headless-doesnt-launch-on-unix)
     # but that seems too easy to get out of date.
     && apt-get install -y google-chrome-stable curl unzip sed git bash xz-utils libglvnd0 ssh xauth x11-xserver-utils libpulse0 libxcomposite1 libgl1-mesa-glx sudo \
     && rm -rf /var/lib/{apt,dpkg,cache,log} \
     && wget --quiet https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh -O /usr/sbin/wait-for-it.sh \
     && chmod +x /usr/sbin/wait-for-it.sh

ENV CHROME_FORCE_NO_SANDBOX=true

COPY . .
RUN dart pub get
RUN dart bin/download_chromium.dart

ENTRYPOINT ["dart", "bin/server.dart"]

EXPOSE 8080

You can make a few changes if you which:

Wizzel1 commented 1 year ago

@xvrh I would prefer not to mess with the default folder structure because I do not have the experience to fix potential bugs. Or do you say that

You can pre-compile the server with RUN dart compile exe -o /server bin/server.dart and have ENTRYPOINT ["/server"]

Is a requirement?

I have built the image with this file:

# An example of using a custom Dockerfile with Dart Frog
# Official Dart image: https://hub.docker.com/_/dart
# Specify the Dart SDK base image version using dart:<version> (ex: dart:2.17)
FROM dart:stable AS build

WORKDIR /app

# Resolve app dependencies.
COPY pubspec.* ./
RUN dart pub get

RUN  apt-get update \
     && apt-get install -y wget gnupg ca-certificates procps libxss1 \
     && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
     && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
     && apt-get update \
     # We install Chrome to get all the OS level dependencies, but Chrome itself
     # is not actually used as it's packaged in the node puppeteer library.
     # Alternatively, we could could include the entire dep list ourselves
     # (https://github.com/puppeteer/puppeteer/blob/master/docs/troubleshooting.md#chrome-headless-doesnt-launch-on-unix)
     # but that seems too easy to get out of date.
     && apt-get install -y google-chrome-stable curl unzip sed git bash xz-utils libglvnd0 ssh xauth x11-xserver-utils libpulse0 libxcomposite1 libgl1-mesa-glx sudo \
     && rm -rf /var/lib/{apt,dpkg,cache,log} \
     && wget --quiet https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh -O /usr/sbin/wait-for-it.sh \
     && chmod +x /usr/sbin/wait-for-it.sh

ENV CHROME_FORCE_NO_SANDBOX=true

# Copy app source code and AOT compile it.
COPY . .

# Generate a production build.
RUN dart pub global activate dart_frog_cli
RUN dart pub global run dart_frog_cli:dart_frog build

# Ensure packages are still up-to-date if anything has changed.
RUN dart pub get --offline
RUN dart compile exe build/bin/server.dart -o build/bin/server

# # Build minimal serving image from AOT-compiled `/server` and required system
# # libraries and configuration files stored in `/runtime/` from the build stage.
# FROM scratch
# COPY --from=build /runtime/ /
# COPY --from=build /app/build/bin/server /app/bin/
# # Uncomment the following line if you are serving static files.
# # COPY --from=build /app/build/public /public/

# # Start the server.
# CMD ["/app/bin/server"]

ENTRYPOINT ["dart", "build/bin/server.dart"]

EXPOSE 8080

but when I now call the puppeteer entry point I get this error: Exception: Websocket url not found

as in #128 you recommended running puppeteer.launch with noSandboxFlag: true so this is how I launch puppeteer:

    browser = await puppeteer.puppeteer.launch(
      headless: false,
      noSandboxFlag: true,
      slowMo: const Duration(milliseconds: 20),
      defaultViewport:
          const puppeteer.DeviceViewport(width: 1920, height: 1080),
    );
xvrh commented 1 year ago

@Wizzel1 Can you share your whole project so I can try it myself?

Wizzel1 commented 1 year ago

@xvrh sure, there you go: https://github.com/Wizzel1/dart_frog_puppeteer_example

Wizzel1 commented 1 year ago

@xvrh sorry, a bit more context: This is the documentation how to run a default dart_frog server and this is the documentation for the custom dockerfile

xvrh commented 1 year ago

@Wizzel1 I just quickly tried and it works for me if you remove the headless: false in "routes/index.dart".

The first request is a bit slow because it has to download chromium. (since you removed the RUN download_chromium.dart from the Dockerfile).

Wizzel1 commented 1 year ago

@xvrh your are right, removing headless: false works for me too. This brings up 2 new questions:

  1. Is it not possible at all to run puppeteer in headful mode on a server?
  2. Is it possible to connect a headful chromium instance to a headless one?
xvrh commented 1 year ago
  1. I think you can. But what is your use-case? Is it to run Chrome extensions? Here is an adapted Dockerfile:
Dockerfile ```Dockerfile # An example of using a custom Dockerfile with Dart Frog # Official Dart image: https://hub.docker.com/_/dart # Specify the Dart SDK base image version using dart: (ex: dart:2.17) FROM dart:stable AS build RUN apt-get update \ && apt-get install -y wget gnupg ca-certificates procps libxss1 xvfb \ && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \ && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \ && apt-get update \ # We install Chrome to get all the OS level dependencies, but Chrome itself # is not actually used as it's packaged in the node puppeteer library. # Alternatively, we could could include the entire dep list ourselves # (https://github.com/puppeteer/puppeteer/blob/master/docs/troubleshooting.md#chrome-headless-doesnt-launch-on-unix) # but that seems too easy to get out of date. && apt-get install -y google-chrome-stable curl unzip sed git bash xz-utils libglvnd0 ssh xauth x11-xserver-utils libpulse0 libxcomposite1 libgl1-mesa-glx sudo \ && rm -rf /var/lib/{apt,dpkg,cache,log} \ && wget --quiet https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh -O /usr/sbin/wait-for-it.sh \ && chmod +x /usr/sbin/wait-for-it.sh WORKDIR /app # Resolve app dependencies. COPY pubspec.* ./ RUN dart pub get ENV CHROME_FORCE_NO_SANDBOX=true # Copy app source code and AOT compile it. COPY . . # Generate a production build. RUN dart pub global activate dart_frog_cli RUN dart pub global run dart_frog_cli:dart_frog build # Ensure packages are still up-to-date if anything has changed. RUN dart pub get --offline RUN dart compile exe build/bin/server.dart -o build/bin/server COPY entrypoint.sh /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] EXPOSE 8080 ```
entrypoint.sh ```sh #!/bin/sh export DISPLAY=:99.0 Xvfb -ac :99 -screen 0 1280x1024x16 > /dev/null 2>&1 & dart build/bin/server.dart ```
  1. To launch a chromium process you use puppeteer.launch() (with or without headless). Internally it will:
    • Start a process (with dart:io Process.start).
    • Wait until the internal devtool server of chrome is started
    • Connect to this server with a Websocket
    • Create a client for this server and return it (The Browser class)

You can also connect to an existing devtool server with puppeteer.connect. It just perform the last 2 steps. You can use this function on a mobile app or on a web page because it doesn't start a process, it just open a websocket connection.

Wizzel1 commented 1 year ago

@xvrh My use case is the following:

I want to automate some tasks with puppeteer on a server but sometimes the user has to enter an OTP or solve a recaptcha.

In this case I want to notify the user, connect a local puppeteer to the one that is running on the server, solve the recaptcha and close the window, so that the instance on the server can continue its job.

xvrh commented 1 year ago

It's not possible to do puppeteer.connect(headful: true) and open a window painting a remote browser.

I can think of a few alternatives but they all have risks and drawbacks:

Wizzel1 commented 1 year ago

@xvrh That's a bummer. Thanks for clarifying. I would prefer option 1 or 2 in this case I think because I can't be sure that the recaptcha triggers again when I close the server-instance and open the page again locally, right?

Do you know if step 2 is possible with dart / flutter out of the box by any chance?

xvrh commented 1 year ago

I can't be sure that the recaptcha triggers again when I close the server-instance and open the page again locally

You can't be sure of anything with a captcha, it's purpose is to prevent automation :-) On some websites, it can be presented at anytime if a behaviour is considered suspect.

Do you know if step 2 is possible with dart / flutter out of the box by any chance?

I think this is beyond the scope of flutter. You have to look at technology like https://en.wikipedia.org/wiki/Virtual_Network_Computing and softwares using it: https://en.wikipedia.org/wiki/Comparison_of_remote_desktop_software

Wizzel1 commented 1 year ago

You can't be sure of anything with a captcha, it's purpose is to prevent automation :-)

Good point! :D

I think this is beyond the scope of flutter. You have to look at technology like https://en.wikipedia.org/wiki/Virtual_Network_Computing and softwares using it: https://en.wikipedia.org/wiki/Comparison_of_remote_desktop_software

I think I will try capturing the cookies when the user first sets up his account. Or maybe I should do the whole automation locally so the user can take control of the browser if needed, though this isn't the user experience I would have liked to provide.

One thing I don't fully understand is this:

In the documentation of this package you say

You can still use puppeteer-dart on Flutter either with: Flutter on Mobile BUT with the actual Chrome instance running on a server and accessed from the mobile app using puppeteer.connect

which got me thinking that it should be possible what I want to do. Can you explain how my use case is different from your example here?

xvrh commented 1 year ago

Flutter on Mobile BUT with the actual Chrome instance running on a server and accessed from the mobile app using puppeteer.connect

What I describe is: the Chrome process runs on the server (headless or headful) and the script commanding it runs on a mobile app. There is no browser running in the mobile app, just a Websocket connection.

In the mobile app, you use the puppeteer client to send command like:

Wizzel1 commented 1 year ago

@xvrh I see. Thanks for answering all my questions. No matter how this project turns out, being able to use puppeteer in dart is invaluable for me, so thanks for maintaining this repository!

sukhcha-in commented 1 year ago

@Wizzel1 I am also looking to host a Chromium instance on the server, from your project:

try {
  browser = await puppeteer.puppeteer.launch(
    noSandboxFlag: true,
  );
  return Response();
}

you're returning an empty response and it works? How can an empty response connect to Flutter app via websocket?

Wizzel1 commented 1 year ago

@sukhcha-in No, since the package author mentioned that what I am trying to do is not possible, I didn't look further into connecting chrome via websockets. Sorry

sukhcha-in commented 1 year ago

@Wizzel1 Alright, thank you for the info. So the only way is to create a server and perform specific actions using routes.

a-wallen commented 6 months ago

@Wizzel1 even though this didn't work out for you. This Dockerfile is exactly what I needed for my dart frog project. Thanks!