Add ability to download the structured files directly in XLSX / CSV format

pyturn commented 4 years ago

Hi,

Can you add the ability by which we can download the structured file formats directly. For example -

Capture_ability

jadchaar commented 4 years ago

Hey @pyturn thanks for suggesting this feature. Can you provide me a link to the page shown in the screenshot?

Also I am curious, what value does the excel document provide? Curious to understand use cases :).

jadchaar commented 2 years ago

Examples of XLSX:

Berkshire Hathaway: https://www.sec.gov/Archives/edgar/data/1067983/000156459021055032/Financial_Report.xlsx
Apple 10-K: https://www.sec.gov/Archives/edgar/data/320193/000032019321000105/Financial_Report.xlsx
- Landing page which contains "View excel document" link: https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-21-000105&xbrl_type=v
Apple 8-K: https://www.sec.gov/Archives/edgar/data/320193/000119312521328151/Financial_Report.xlsx
- Landing page which contains "View excel document" link: https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0001193125-21-328151&xbrl_type=v

Seems that the URL hierarchy is standard: https://www.sec.gov/Archives/edgar/data/{CIK}/{ACCESSION_NUM}/Financial_Report.xlsx

This should be quite easy to add if the filename is standard as Financial_Report.xlsx, else adding this functionality would require web scraping. Without web scraping, I may be able to attempt the download at the URL https://www.sec.gov/Archives/edgar/data/{CIK}/{ACCESSION_NUM}/Financial_Report.xlsx and if I get an error, I can just ignore the download and move on. If every filing uses Filing_Report.xlsx all these downloads should succeed since the resource would exist at the URL.

lkl2050 commented 2 years ago

hope to add function to download other types of attachment documents as well. Like https://www.sec.gov/Archives/edgar/data/1934348/000166919122000687/offeringstatement.pdf

Their names are not always like offeringstatement.pdf, but they will usually be pdf and jpg files, so its possible to use regex to allow downloading all urls that have a format of https://www.sec.gov/Archives/edgar/data/CIK/ACCESSION_NUM/.*.pdf

jadchaar / sec-edgar-downloader

Add ability to download the structured files directly in XLSX / CSV format #43