justinlittman / fb-ad-archive-scraper

Scraper for Facebook's Archive of Ads with Political Content
MIT License
35 stars 11 forks source link


Scraper for Facebook's Archive of Ads with Political Content ... until Facebook provides an API.

fb-ad-archive-scraper will produce:

Like any scraper, fb-ad-archive-scraper is fragile. It will break if Facebook changes the structure / code of the Archive. If fb-ad-archive-scraper breaks, let me know.

Tickets / PRs are welcome.


  1. Clone the repo:

    git clone https://github.com/justinlittman/fb-ad-archive-scraper.git
  2. Change to the directory:

    cd fb-ad-archive-scraper
  3. Optionally, create a virtual environment:

    virtualenv -p python3 ENV
    source ENV/bin/activate
  4. Install requirements:

    pip install -r requirements.txt
  5. Install Chromedriver. On a Mac, this is:

    brew cask install chromedriver

    If already installed, upgrade Chromedriver with:

    brew cask upgrade chromedriver


    usage: scraper.py [-h] [--limit LIMIT] [--headed] [--wait WAIT]
                      [--country {ALL,US,BR}]
                      [--type {news_ads,political_and_issue_ads}]
                      [--status {all,active,inactive}]
                      email password query [query ...]

    Scrape Facebook's Archive of Ads with Political Content

    positional arguments:
      email                 Email address for FB account
      password              Password for FB account
      query                 Query

    optional arguments:
      -h, --help            show this help message and exit
      --limit LIMIT         Limit on number of ads to scrape
      --headed              Use a headed chrome browser
      --wait WAIT           Seconds to sleep between requests
      --country {ALL,US,BR}
                            Limit ads by country. Choices: ALL, US, BR. Default is
      --type {news_ads,political_and_issue_ads}
                            Limit ads by type. Choices: news_ads,
                            political_and_issue_ads. Default is
      --status {all,active,inactive}
                            Limit ads by status. Choices: all, active, inactive.
                            Default is all.

For example:

    python scraper.py fbuser@gmail.com password pelosi


The approach of extracting data from XHRs came from Ranjit Hatnagar.