ulixee / unblocked

A suite of tools for protecting the web's open knowledge.
MIT License
129 stars 12 forks source link

Migrated into ulixee/hero repo

This repo will have future work done in the Hero repo


Old readme:

Unblocked Web

This project maintains a suite of tools for protecting the web's open knowledge. Its primary function is to create a web-scraping engine that mimics a human interacting with a website - both from a user behavior, as well as from a "browser" perspective.

Using this Repository

This is a Monorepo to work on the Browser Detect + Evade workflow of building an automated engine. It requires Yarn workspaces.

You can work with the project by:

  1. Cloning the repository and installing git submodules (you can add --recursive to your initial clone request).
  2. Run yarn build. NOTE: you must run this command to build typescript files.

Browser Profiles

If you want to work with profiles (ie, update Emulator Data, generate Double Agent probes, etc), you'll need to download the BrowserProfiles data: $ yarn workspace @ulixee/unblocked-browser-profiler downloadData. This will clone data into a folder called browser-profile-data adjacent to the unblocked folder.

Questions

Join us on the Ulixee Discord for any questions or comments (it's a sister project).

Projects

This repository is home to several of the projects needed to create an "unblocked" automated browser engine. We imagine a world where there are many participants sharing evasions and emulations for all the web features into a single repository. They will live right next to an advanced bot blocking detection engine that can analyze every facet of a web scraping session (TCP, TLS, HTTP, DOM, User Interactions, etc). A profiler that can run all detections using real browser/operating systems to generate profiles of true browser signatures. And an implementation of an agent that can run all the evasions and run unblocked.

Contributing

We'd love your help improving Unblocked tools. Please don't hesitate to send a pull request. The best starting place is to add an evasion to the Unblocked Plugins or to add detections to DoubleAgent.

All Unblocked projects use eslint for code standards and ensure lint + test are run before allowing any pushes.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

License

MIT