ugo-studio / cloudflare-puppeteer-extra

6 stars 1 forks source link

Note:

This package is currently not useable. I tried to rewrite the original puppeteer-extra package to work on cloudflare(edge environment), but it depends on standard nodejs modules(fs, path, etc), which isn't available on the edge environment.

puppeteer-extra GitHub Workflow Status npm npm npm

A light-weight wrapper around @cloudflare/puppeteer and friends to enable cool plugins through a clean interface..

Installation

yarn add @cloudflare/puppeteer cloudflare-puppeteer-extra
# - or -
npm install @cloudflare/puppeteer cloudflare-puppeteer-extra

Quickstart

// cloudflare-puppeteer-extra is a drop-in replacement for @cloudflare/puppeteer,
// it augments the installed puppeteer with plugin functionality.
// Any number of plugins can be added through `puppeteer.use()`
const puppeteer = require("cloudflare-puppeteer-extra");

// Add stealth plugin and use defaults (all tricks to hide puppeteer usage)
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());

// Add adblocker plugin to block all ads and trackers (saves bandwidth)
const AdblockerPlugin = require("puppeteer-extra-plugin-adblocker");
puppeteer.use(AdblockerPlugin({ blockTrackers: true }));

// That's it, the rest is puppeteer usage as normal 😊
puppeteer.launch({ headless: true }).then(async (browser) => {
  const page = await browser.newPage();
  await page.setViewport({ width: 800, height: 600 });

  console.log(`Testing adblocker plugin..`);
  await page.goto("https://www.vanityfair.com");
  await page.waitForTimeout(1000);
  await page.screenshot({ path: "adblocker.png", fullPage: true });

  console.log(`Testing the stealth plugin..`);
  await page.goto("https://bot.sannysoft.com");
  await page.waitForTimeout(5000);
  await page.screenshot({ path: "stealth.png", fullPage: true });

  console.log(`All done, check the screenshots. ✨`);
  await browser.close();
});

The above example uses the stealth and adblocker plugin, which need to be installed as well:

yarn add puppeteer-extra-plugin-stealth puppeteer-extra-plugin-adblocker
# - or -
npm install puppeteer-extra-plugin-stealth puppeteer-extra-plugin-adblocker

If you'd like to see debug output just run your script like so:

DEBUG=puppeteer-extra,puppeteer-extra-plugin:* node myscript.js

More examples

TypeScript usage
> `puppeteer-extra` and most plugins are written in TS, > so you get perfect type support out of the box. :) ```ts import puppeteer from "puppeteer-extra"; import AdblockerPlugin from "puppeteer-extra-plugin-adblocker"; import StealthPlugin from "puppeteer-extra-plugin-stealth"; puppeteer.use(AdblockerPlugin()).use(StealthPlugin()); puppeteer .launch({ headless: false, defaultViewport: null }) .then(async (browser) => { const page = await browser.newPage(); await page.goto("https://bot.sannysoft.com"); await page.waitForTimeout(5000); await page.screenshot({ path: "stealth.png", fullPage: true }); await browser.close(); }); ``` > Please check this [wiki](https://github.com/berstend/puppeteer-extra/wiki/TypeScript-usage) entry in case you have TypeScript related import issues. ![typings](https://i.imgur.com/bNtuTOt.png "Typings")
Playwright usage
[`playright-extra`](/packages/playwright-extra) with plugin support is available as well.
Multiple puppeteers with different plugins
```js const vanillaPuppeteer = require("puppeteer"); const { addExtra } = require("puppeteer-extra"); const AnonymizeUA = require("puppeteer-extra-plugin-anonymize-ua"); async function main() { const pptr1 = addExtra(vanillaPuppeteer); pptr1.use( AnonymizeUA({ customFn: (ua) => "Hello1/" + ua.replace("Chrome", "Beer"), }) ); const pptr2 = addExtra(vanillaPuppeteer); pptr2.use( AnonymizeUA({ customFn: (ua) => "Hello2/" + ua.replace("Chrome", "Beer"), }) ); await checkUserAgent(pptr1); await checkUserAgent(pptr2); } main(); async function checkUserAgent(pptr) { const browser = await pptr.launch({ headless: true }); const page = await browser.newPage(); await page.goto("https://httpbin.org/headers", { waitUntil: "domcontentloaded", }); const content = await page.content(); console.log(content); await browser.close(); } ```
Using with puppeteer-cluster
> [puppeteer-cluster](https://github.com/thomasdondorf/puppeteer-cluster) allows you to create a cluster of puppeteer workers and plays well together with `puppeteer-extra`. ```js const { Cluster } = require("puppeteer-cluster"); const vanillaPuppeteer = require("puppeteer"); const { addExtra } = require("puppeteer-extra"); const Stealth = require("puppeteer-extra-plugin-stealth"); const Recaptcha = require("puppeteer-extra-plugin-recaptcha"); async function main() { // Create a custom puppeteer-extra instance using `addExtra`, // so we could create additional ones with different plugin config. const puppeteer = addExtra(vanillaPuppeteer); puppeteer.use(Stealth()); puppeteer.use(Recaptcha()); // Launch cluster with puppeteer-extra const cluster = await Cluster.launch({ puppeteer, maxConcurrency: 2, concurrency: Cluster.CONCURRENCY_CONTEXT, }); // Define task handler await cluster.task(async ({ page, data: url }) => { await page.goto(url); const { hostname } = new URL(url); const { captchas } = await page.findRecaptchas(); console.log(`Found ${captchas.length} captcha on ${hostname}`); await page.screenshot({ path: `${hostname}.png`, fullPage: true }); }); // Queue any number of tasks cluster.queue("https://bot.sannysoft.com"); cluster.queue("https://www.google.com/recaptcha/api2/demo"); cluster.queue("http://www.wikipedia.org/"); await cluster.idle(); await cluster.close(); console.log(`All done, check the screenshots. ✨`); } // Let's go main().catch(console.warn); ``` For using with TypeScript, just change your imports to: ```ts import { Cluster } from "puppeteer-cluster"; import vanillaPuppeteer from "puppeteer"; import { addExtra } from "puppeteer-extra"; import Stealth from "puppeteer-extra-plugin-stealth"; import Recaptcha from "puppeteer-extra-plugin-recaptcha"; ```
Using with chrome-aws-lambda
> If you plan to use [chrome-aws-lambda](https://github.com/alixaxel/chrome-aws-lambda) with the [`stealth`](/packages/puppeteer-extra-plugin-stealth) plugin, you'll need to modify the default args to remove the > `--disable-notifications` flag to pass all the tests. ```js const chromium = require("chrome-aws-lambda"); const { addExtra } = require("puppeteer-extra"); const puppeteerExtra = addExtra(chromium.puppeteer); const launch = async () => { puppeteerExtra .launch({ args: chromium.args, defaultViewport: chromium.defaultViewport, executablePath: await chromium.executablePath, headless: chromium.headless, }) .then(async (browser) => { const page = await browser.newPage(); await page.goto( "https://www.spacejam.com/archive/spacejam/movie/jam.htm" ); await page.waitForTimeout(10 * 1000); await browser.close(); }); }; launch(); // Launch Browser ```
Using with Kikobeats/browserless
> [Kikobeats/browserless](https://github.com/Kikobeats/browserless) is a puppeteer-like Node.js library for interacting with Headless production scenarios. ```js const puppeteer = require("puppeteer-extra"); const StealthPlugin = require("puppeteer-extra-plugin-stealth"); puppeteer.use(StealthPlugin()); const browserless = require("browserless")({ puppeteer }); const saveBufferToFile = (buffer, fileName) => { const wstream = require("fs").createWriteStream(fileName); wstream.write(buffer); wstream.end(); }; browserless .screenshot("https://bot.sannysoft.com", { device: "iPhone 6" }) .then((buffer) => { const fileName = "screenshot.png"; saveBufferToFile(buffer, fileName); console.log(`your screenshot is here: `, fileName); }); ```

Plugins

🔥 puppeteer-extra-plugin-stealth

🏴 puppeteer-extra-plugin-recaptcha

puppeteer-extra-plugin-adblocker

puppeteer-extra-plugin-devtools

puppeteer-extra-plugin-repl

puppeteer-extra-plugin-block-resources

puppeteer-extra-plugin-flash

puppeteer-extra-plugin-anonymize-ua

puppeteer-extra-plugin-user-preferences

Check out the packages folder for more plugins.

Community Plugins

These plugins have been generously contributed by members of the community. Please note that they're hosted outside the main project and not under our control or supervision.

puppeteer-extra-plugin-minmax

puppeteer-extra-plugin-portal

Please check the Contributing section below if you're interested in creating a plugin as well.


Contributors

Further info

Contributing
PRs and new plugins are welcome! 🎉 The plugin API for `puppeteer-extra` is clean and fun to use. Have a look the [PuppeteerExtraPlugin](/packages/puppeteer-extra-plugin) base class documentation to get going and check out the [existing plugins](./packages/) (minimal example is the [anonymize-ua](/packages/puppeteer-extra-plugin-anonymize-ua/index.js) plugin) for reference. We use a [monorepo](/) powered by [Lerna](https://github.com/lerna/lerna#--use-workspaces) (and yarn workspaces), [ava](https://github.com/avajs/ava) for testing, TypeScript for the core, the [standard](https://standardjs.com/) style for linting and [JSDoc](http://usejsdoc.org/about-getting-started.html) heavily to auto-generate markdown [documentation](https://github.com/documentationjs/documentation) based on code. :-)
Kudos
- Thanks to [skyiea](https://github.com/skyiea) for [this PR](https://github.com/GoogleChrome/puppeteer/pull/1806) that started the project idea. - Thanks to [transitive-bullshit](https://github.com/transitive-bullshit) for [suggesting](https://github.com/berstend/puppeteer-extra/issues/2) a modular plugin design, which was fun to implement.
Compatibility
`puppeteer-extra` and all plugins are [tested continously](https://github.com/berstend/puppeteer-extra/actions) in a matrix of current (stable & LTS) NodeJS and puppeteer versions. We never broke compatibility and still support puppeteer down to very early versions from 2018. A few plugins won't work in headless mode (it's noted if that's the case) due to Chrome limitations (e.g. the [`user-preferences`](/packages/puppeteer-extra-plugin-user-preferences) plugin), look into `xvfb-run` if you still require a headless experience in these circumstances.

Changelog

2.1.6 ➠ 3.1.1 ### `2.1.6` ➠ `3.1.1` Big refactor, the core is now **written in TypeScript** 🎉 That means out of the box type safety for fellow TS users and nice auto-completion in VSCode for JS users. Also: - A new [`addExtra`](#addextrapuppeteer) export, to **patch any puppeteer compatible library with plugin functionality** (`chrome-aws-lambda`, etc). This also allows for multiple puppeteer instances with different plugins. The API is backwards compatible, I bumped the major version just in case I missed something. Please report any issues you might find with the new release. :)

API

Table of Contents

class: PuppeteerExtra

Modular plugin framework to teach puppeteer new tricks.

This module acts as a drop-in replacement for puppeteer.

Allows PuppeteerExtraPlugin's to register themselves and to extend puppeteer with additional functionality.

Example:

const puppeteer = require("puppeteer-extra");
puppeteer.use(require("puppeteer-extra-plugin-anonymize-ua")());
puppeteer.use(
  require("puppeteer-extra-plugin-font-size")({ defaultFontSize: 18 })
);
(async () => {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();
  await page.goto("http://example.com", { waitUntil: "domcontentloaded" });
  await browser.close();
})();

.use(plugin)

Returns: this The same PuppeteerExtra instance (for optional chaining)

The main interface to register puppeteer-extra plugins.

Example:

puppeteer.use(plugin1).use(plugin2);

.launch(options?)

Returns: Promise<Puppeteer.Browser>

The method launches a browser instance with given arguments. The browser will be closed when the parent node.js process is closed.

Augments the original puppeteer.launch method with plugin lifecycle methods.

All registered plugins that have a beforeLaunch method will be called in sequence to potentially update the options Object before launching the browser.

Example:

const browser = await puppeteer.launch({
  headless: false,
  defaultViewport: null,
});

.connect(options?)

Returns: Promise<Puppeteer.Browser>

Attach Puppeteer to an existing Chromium instance.

Augments the original puppeteer.connect method with plugin lifecycle methods.

All registered plugins that have a beforeConnect method will be called in sequence to potentially update the options Object before launching the browser.


.defaultArgs(options?)

Returns: Array<string>

The default flags that Chromium will be launched with.


.executablePath()

Returns: string

Path where Puppeteer expects to find bundled Chromium.


.createBrowserFetcher(options?)

Returns: Puppeteer.BrowserFetcher

This methods attaches Puppeteer to an existing Chromium instance.


.plugins

Type: Array<PuppeteerExtraPlugin>

Get a list of all registered plugins.


.getPluginData(name?)

Collects the exposed data property of all registered plugins. Will be reduced/flattened to a single array.

Can be accessed by plugins that listed the dataFromPlugins requirement.

Implemented mainly for plugins that need data from other plugins (e.g. user-preferences).


defaultExport()

Type: PuppeteerExtra

The default export will behave exactly the same as the regular puppeteer (just with extra plugin functionality) and can be used as a drop-in replacement.

Behind the scenes it will try to require either puppeteer or puppeteer-core from the installed dependencies.

Example:

// javascript import
const puppeteer = require('puppeteer-extra')

// typescript/es6 module import
import puppeteer from 'puppeteer-extra'

// Add plugins
puppeteer.use(...)

addExtra(puppeteer)

Returns: PuppeteerExtra A fresh PuppeteerExtra instance using the provided puppeteer

An alternative way to use puppeteer-extra: Augments the provided puppeteer with extra plugin functionality.

This is useful in case you need multiple puppeteer instances with different plugins or to add plugins to a non-standard puppeteer package.

Example:

// js import
const puppeteerVanilla = require('puppeteer')
const { addExtra } = require('puppeteer-extra')

// ts/es6 import
import puppeteerVanilla from 'puppeteer'
import { addExtra } from 'puppeteer-extra'

// Patch provided puppeteer and add plugins
const puppeteer = addExtra(puppeteerVanilla)
puppeteer.use(...)

License

Copyright © 2018 - 2023, berstend̡̲̫̹̠̖͚͓̔̄̓̐̄͛̀͘. Released under the MIT License.