mdamien / chrome-extensions-archive

:pager: Archive all the chrome extensions (until Feb 4. 2019)
https://crx.dam.io
MIT License
386 stars 70 forks source link

Chrome Extensions Archive: No updates since Feb 4. 2019

In maintenance: disk is full ! (2 To)

The goal is to provide a complete archive of the chrome web store with version history.

You can see the current status of what's archived and download the files here: dam.io/chrome-extensions-archive/

Installing the extensions

To install an extension, go to chrome://extensions/ and drop the file.

To avoid the auto-update, load it as an unpacked extension

Files are named as .zip but they are the exact same .crx stored on the store.

Running the scripts

scripts are python 3.5+ only

Install dependencies: pip3 install -r req.txt

Create some folders and initialize some files:

mkdir data
mkdir crawled
mkdir crawled/sitemap
mkdir crawled/pages
mkdir crawled/crx
mkdir crawled/tmp
mkdir ../site
mkdir ../site/chrome-extensions-archive
mkdir ../site/chrome-extensions-archive/ext
echo "{}" > data/not_in_sitemap.json

Crawling:

Site & stats:

Then I serve the files directly with nginx (see nginx.conf file for example)

Helping out

I have a few things in mind for the future:

Don't hesitate to reach out (here on issues, damien@dam.io or @dam_io on twitter)

To propose changes, just do a PR.