mccgr / edgar

Code to manage data related to SEC EDGAR
31 stars 15 forks source link

EDGAR

The Electronic Data Gathering, Analysis, and Retrieval system (EDGAR) is an online repository, first constructed in 1993, which is maintained by the US Securities and Exchange Commission (SEC), and is designed to automate the collection, validation and acceptance of submissions and announcements by companies that are required to do so by law. The edgar schema is a collection of information scraped from this repository. The tables contained in this database schema consist of a number of main tables, containing the most fundamental information contained in EDGAR, as well as a number of dependent tables which are constructed most generally by scraping and cleaning the information in the filings and documents linked from the main tables, such as the set of tables listing the information contained in the Form 3, Form 4, and Form 5 filings. This readme file discusses the main tables. By far the most important tables are filings and filing_docs, which contain the basic information on each filing and their linking documents respectively, which one can use to deduce the url links to get to them, or to find the location of the documents in memory if they have been downloaded.

filings and filing_docs

As mentioned above, these are main tables indexing all the filings in the SEC EDGAR database since 1993, and their associated documents respectively. Here, for each table we give a list of the associated fields.

Other central tables

Tables used in updating tables above

Code

A script to update the Edgar database is found here.