nlpaueb / edgar-crawler

The only open-source toolkit that can download EDGAR financial reports and extract textual data from specific item sections into nice and clean JSON files.
GNU General Public License v3.0
294 stars 80 forks source link
business economics edgar edgar-crawler finance natural-language-processing nlp python sec

EDGAR-CRAWLER: Extract Key Financial Data from SEC Filings Effortlessly 🚀

EDGAR-CRAWLER-LOGO

EDGAR-CRAWLER simplifies access to financial text data by downloading SEC EDGAR filings and transforming these complex, unstructured documents into structured, standardized JSON files, making it easier to use them for downstream NLP tasks and financial analysis.


EDGAR-CRAWLER has 2 core functionalities:

🚨 News

Table of Contents

Example Outputs

EDGAR-CRAWLER produces structured JSON outputs for easy handling of unstructured/complex SEC/EDGAR filings. Below are examples of these clean, extracted outputs for each supported filing type:

10-K filing (Annual Report)

Original report: Apple 10-K from 2022

  {
    "cik": "320193",
    "company": "Apple Inc.",
    "filing_type": "10-K",
    "filing_date": "2022-10-28",
    "period_of_report": "2022-09-24",
    "sic": "3571",
    "state_of_inc": "CA",
    "state_location": "CA",
    "fiscal_year_end": "0924",
    "filing_html_index": "https://www.sec.gov/Archives/edgar/data/320193/0000320193-22-000108-index.html",
    "htm_filing_link": "https://www.sec.gov/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htm",
    "complete_text_filing_link": "https://www.sec.gov/Archives/edgar/data/320193/0000320193-22-000108.txt",
    "filename": "320193_10K_2022_0000320193-22-000108.htm",
    "item_1": "Item 1. Business\nCompany Background\nThe Company designs, manufactures ...",
    "item_1A": "Item 1A. Risk Factors\nThe Company’s business, reputation, results of ...",
    "item_1B": "Item 1B. Unresolved Staff Comments\nNone.",
    "item_1C": "",
    "item_2": "Item 2. Properties\nThe Company’s headquarters are located in Cupertino, California. ...",
    "item_3": "Item 3. Legal Proceedings\nEpic Games\nEpic Games, Inc. (“Epic”) filed a lawsuit ...",
    "item_4": "Item 4. Mine Safety Disclosures\nNot applicable. ...",
    "item_5": "Item 5. Market for Registrant’s Common Equity, Related Stockholder ...",
    "item_6": "Item 6. [Reserved]\nApple Inc. | 2022 Form 10-K | 19",
    "item_7": "Item 7. Management’s Discussion and Analysis of Financial Condition ...",
    "item_8": "Item 8. Financial Statements and Supplementary Data\nAll financial ...",
    "item_9": "Item 9. Changes in and Disagreements with Accountants on Accounting and Financial Disclosure\nNone.",
    "item_9A": "Item 9A. Controls and Procedures\nEvaluation of Disclosure Controls and ...",
    "item_9B": "Item 9B. Other Information\nRule 10b5-1 Trading Plans\nDuring the three months ...",
    "item_9C": "Item 9C. Disclosure Regarding Foreign Jurisdictions that Prevent Inspections\nNot applicable. ...",
    "item_10": "Item 10. Directors, Executive Officers and Corporate Governance\nThe information required ...",
    "item_11": "Item 11. Executive Compensation\nThe information required by this Item will be included ...",
    "item_12": "Item 12. Security Ownership of Certain Beneficial Owners and Management and ...",
    "item_13": "Item 13. Certain Relationships and Related Transactions, and Director Independence ...",
    "item_14": "Item 14. Principal Accountant Fees and Services\nThe information required ...",
    "item_15": "Item 15. Exhibit and Financial Statement Schedules\n(a)Documents filed as part ...",
    "item_16": "Item 16. Form 10-K Summary\nNone.\nApple Inc. | 2022 Form 10-K | 57"
  }

10-Q (Quarterly Report)

Click to see a full structured output example of a 10-Q filing. Original report: [Apple 10-Q from Q1 2024](https://www.sec.gov/Archives/edgar/data/320193/000032019324000069/aapl-20240330.htm) ```json { "cik": "320193", "company": "Apple Inc.", "filing_type": "10-Q", "filing_date": "2024-05-03", "period_of_report": "2024-03-30", "sic": "3571", "state_of_inc": "CA", "state_location": "CA", "fiscal_year_end": "0928", "filing_html_index": "https://www.sec.gov/Archives/edgar/data/320193/0000320193-24-000069-index.html", "htm_filing_link": "https://www.sec.gov/Archives/edgar/data/320193/000032019324000069/aapl-20240330.htm", "complete_text_filing_link": "https://www.sec.gov/Archives/edgar/data/320193/0000320193-24-000069.txt", "filename": "320193_10Q_2024_0000320193-24-000069.htm", "part_1": "PART I - FINANCIAL INFORMATION\nItem 1. Financial Statements\nApple Inc.\nCONDENSED CONSOLIDATED STATEMENTS ...", "part_1_item_1": "Item 1. Financial Statements\nApple Inc.\nCONDENSED CONSOLIDATED STATEMENTS ...", "part_1_item_2": "Item 2. Management’s Discussion and Analysis of Financial Condition and ...", "part_1_item_3": "Item 3. Quantitative and Qualitative Disclosures About Market Risk\nThere have ...", "part_1_item_4": "Item 4. Controls and Procedures\nEvaluation of Disclosure Controls and ...", "part_2": "PART II - OTHER INFORMATION\nItem 1. Legal Proceedings\nDigital Markets Act Investigations\nOn ...", "part_2_item_1": "Item 1. Legal Proceedings\nDigital Markets Act Investigations\nOn March 25, 2024, ...", "part_2_item_1A": "Item 1A. Risk Factors\nThe Company’s business, reputation, ...", "part_2_item_2": "Item 2. Unregistered Sales of Equity Securities and Use of ...", "part_2_item_3": "Item 3. Defaults Upon Senior Securities\nNone.", "part_2_item_4": "Item 4. Mine Safety Disclosures\nNot applicable.", "part_2_item_5": "Item 5. Other Information\nInsider Trading Arrangements\nNone.", "part_2_item_6": "Item 6. Exhibits\nIncorporated by Reference\nExhibit\nNumber\nExhibit Description ..." } ``` **Note:** `part_1` and `part_2` contain the full detected text for that Part. We provide that, since in some old 10-Q files, it is not possible to extract the information in item level.

8-K (Important Current Report)

Click to see a full structured output example of an 8-K filing. Original report: [Apple 8-K from 2022-08-19](https://www.sec.gov/Archives/edgar/data/320193/000119312522225365/d366128d8k.htm) ```json { "cik": "320193", "company": "Apple Inc.", "filing_type": "8-K", "filing_date": "2022-08-19", "period_of_report": "2022-08-17", "sic": "3571", "state_of_inc": "CA", "state_location": "CA", "fiscal_year_end": "0924", "filing_html_index": "https://www.sec.gov/Archives/edgar/data/320193/0001193125-22-225365-index.html", "htm_filing_link": "https://www.sec.gov/Archives/edgar/data/320193/000119312522225365/d366128d8k.htm", "complete_text_filing_link": "https://www.sec.gov/Archives/edgar/data/320193/0001193125-22-225365.txt", "filename": "320193_8K_2022_0001193125-22-225365.htm", "item_1.01": "", "item_1.02": "", "item_1.03": "", "item_1.04": "", "item_1.05": "", "item_2.01": "", "item_2.02": "", "item_2.03": "", "item_2.04": "", "item_2.05": "", "item_2.06": "", "item_3.01": "", "item_3.02": "", "item_3.03": "", "item_4.01": "", "item_4.02": "", "item_5.01": "", "item_5.02": "Item 5.02 Departure of Directors or Certain Officers; Election of Directors; Appointment ...", "item_5.03": "Item 5.03 Amendments to Articles of Incorporation or Bylaws; Change in Fiscal Year.\nOn August 17, 2022, Apple’s Board approved and adopted amended and restated bylaws ...", "item_5.04": "", "item_5.05": "", "item_5.06": "", "item_5.07": "", "item_5.08": "", "item_6.01": "", "item_6.02": "", "item_6.03": "", "item_6.04": "", "item_6.05": "", "item_7.01": "", "item_8.01": "", "item_9.01": "Item 9.01 Financial Statements and Exhibits.\n(d) Exhibits.\nExhibit\nNumber\nExhibit ...", } ```

Install

Method 2: HTTPS

git clone git@github.com:nlpaueb/edgar-crawler.git


- Then, it's recommended to create a new virtual environment using Python 3.8 by [installing and using Anaconda](https://docs.anaconda.com/anaconda/install/index.html).
```bash
conda create -n edgar-crawler-venv python=3.8 # After installing Anaconda, create a venv with python 3.8+
conda activate edgar-crawler-venv # Activate the environment

Usage

Citation

An EDGAR-CRAWLER paper is on its way. Until then, please cite the relevant EDGAR-CORPUS paper published at the 3rd Economics and Natural Language Processing (ECONLP) workshop at EMNLP 2021 (Punta Cana, Dominican Republic):

@inproceedings{loukas-etal-2021-edgar-corpus-and-edgar-crawler,
    title = "{EDGAR}-{CORPUS}: {B}illions of {T}okens {M}ake {T}he {W}orld {G}o {R}ound",
    author = "Loukas, Lefteris  and
      Fergadiotis, Manos  and
      Androutsopoulos, Ion  and
      Malakasiotis, Prodromos",
    booktitle = "Proceedings of the Third Workshop on Economics and Natural Language Processing (ECONLP)",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.econlp-1.2",
    pages = "13--18",
}

Read the EDGAR-CORPUS paper here: https://aclanthology.org/2021.econlp-1.2/

Star History

Star History Chart

Accompanying Resources

Here are some additional resources related to EDGAR-CRAWLER:

Contributing

PRs and contributions are accepted.

Please use the Feature Branch Workflow.

Issues

Please create an issue on GitHub instead of emailing us directly so all possible users can benefit from the troubleshooting.

License

This software is licensed under the GNU General Public License v3.0, a license approved by the Open-Source Initiative (OSI).