wabarc / wayback

An archiving tool with an IM-style interface that prioritizes privacy and accessibility, integrated with various archival services including Internet Archive, archive.today, IPFS, Telegraph, and file systems.
https://docs.wabarc.eu.org
GNU General Public License v3.0
1.71k stars 64 forks source link

Bypass Paywall and CAPTCHA #144

Open waybackarchiver opened 2 years ago

waybackarchiver commented 2 years ago

Launch headfull browser with Xvfb and import extensions.

Relates to #92

Xvfb -ac :99 -screen 0 1280x1024x16 > /dev/null 2>&1 &
export DISPLAY=:99.0
chromium --headless=false --load-extension=path/to/extension

Extensions:

Misc:

hellodword commented 2 years ago

Okay move to this issue :)

I have two ideas.

  1. Run puppeteer-extra-* with puppeteer, and dump the CDP messages (I did this before, easy but not graceful), and generate go code, but it's difficult to deal with random scripts, such as https://www.npmjs.com/package/puppeteer-extra-plugin-stealth
  2. Provide a node runtime in go side or the browser side, such as https://github.com/browserify/browserify

How do you think?

waybackarchiver commented 2 years ago

dump the CDP messages and generate go code

This approach may complicate and add uncertainty to the situation.

Provide a node runtime in go side or the browser side

As expected, this approach provides a new extension to call the methods exposed by puppeteer-extra-*.

hellodword commented 2 years ago

dump the CDP messages and generate go code

This approach may complicate and add uncertainty to the situation.

Right, such as https://github.com/kkoooqq/fakebrowser/blob/586e85c0ed872513d2e0703d8c516250a8a4365b/src/core/DeviceDescriptor.ts#L463-L479

hellodword commented 2 years ago

Provide a node runtime in go side or the browser side

As expected, this approach provides a new extension to call the methods exposed by puppeteer-extra-*.

But this is complicate too, ESM and CJS, js and ts, dependencies, and so on. It sounds like a webpack in browser.

waybackarchiver commented 2 years ago

If the method on callsite or by extracting puppeteer-extra does not work, we can search for alternative extensions or create one.

I'm working for launching Chrome and loading extensions. Next, make it possible to customize it so that it can load more extensions.

waybackarchiver commented 2 years ago

Related project wabarc/starter, and more details see runs

image

waybackarchiver commented 2 years ago

Relates to wabarc/screenshot#11

waybackarchiver commented 2 years ago

The extension bypass-paywall is currently supported in conjunction with the on-heroku project, and the next step will be to make the starter extra approachable and to add more extensions to it.

Unfortunately, the incapability to save PDFs and screenshots in the X11 environment has arisen, which means that core idea may not be completely operational.