jadchaar / sec-edgar-downloader

📈 Download filings from the SEC EDGAR database using Python
https://sec-edgar-downloader.readthedocs.io
MIT License
492 stars 137 forks source link

Add ability to download the structured files directly in XLSX / CSV format #43

Open pyturn opened 4 years ago

pyturn commented 4 years ago

Hi,

Can you add the ability by which we can download the structured file formats directly. For example -

Capture_ability

jadchaar commented 4 years ago

Hey @pyturn thanks for suggesting this feature. Can you provide me a link to the page shown in the screenshot?

Also I am curious, what value does the excel document provide? Curious to understand use cases :).

jadchaar commented 2 years ago

Examples of XLSX:

Seems that the URL hierarchy is standard: https://www.sec.gov/Archives/edgar/data/{CIK}/{ACCESSION_NUM}/Financial_Report.xlsx

This should be quite easy to add if the filename is standard as Financial_Report.xlsx, else adding this functionality would require web scraping. Without web scraping, I may be able to attempt the download at the URL https://www.sec.gov/Archives/edgar/data/{CIK}/{ACCESSION_NUM}/Financial_Report.xlsx and if I get an error, I can just ignore the download and move on. If every filing uses Filing_Report.xlsx all these downloads should succeed since the resource would exist at the URL.

lkl2050 commented 2 years ago

hope to add function to download other types of attachment documents as well. Like https://www.sec.gov/Archives/edgar/data/1934348/000166919122000687/offeringstatement.pdf

Their names are not always like offeringstatement.pdf, but they will usually be pdf and jpg files, so its possible to use regex to allow downloading all urls that have a format of https://www.sec.gov/Archives/edgar/data/CIK/ACCESSION_NUM/.*.pdf