upenndigitalscholarship / regulations-gov-comment-scraper

10 stars 1 forks source link

Regulations.gov Scraper

This tool scrapes Regulations.gov comment information including the submitter name, organization and any attachments by document id.

This is a very barebones tool! We haven't even provided a CLI. To run, ensure you have the Python requests library installed. (It comes with Anaconda; if you're new to Python, we recommend installing Anaconda and letting it do the heavy lifting for you.)

You'll also need to apply for an API key as described here.

Once you have Python and the requests library installed, and have received an API key, make the following changes:

1) API Key:

In the line that reads

   api_key = '' # insert your api key between quotes

copy your API key and paste it between the signle quotation marks:

   api_key = '(THE API KEY THAT YOU COPIED)'

2) Docket ID:

In the line that reads

   docket_id = '' # insert the docket id between quotes (e.g. VA-2016-VHA-0011)

paste the docket ID between the single quotation marks:

   docket_id = 'ED-2018-OCR-0064' 

The docket ID appears on the page for the set of comments you're scraping: A screenshot of the docket ID

3) Total number of documents:

In the line that reads

   total_docs = 217568  # total number of documents, as indicated by the page for the given docket id

paste the number of documents in place of the current value:

   total_docs = 14835

The number of documents also appears on the page for the set of comments you're scraping: A screenshot of the number of documents