mccgr / edgar

Code to manage data related to SEC EDGAR
31 stars 15 forks source link

New location of the edgar filing documents? #59

Closed bdcallen closed 4 years ago

bdcallen commented 5 years ago

@iangow @jamespkav So previously, the documents downloaded from edgar were stored in the 2TB drive under data/edgar. But after the transfer of a lot of data to the 6TB drive, the data folder doesn't appear in the 2TB drive. Nor is there a data/edgar folder in 6TB (I had previously set up a data folder to hold the asxlisting documents). So this issue is for finding where the edgar documents are now stored.

bdcallen commented 4 years ago

@iangow I was going to raise this in the meeting tomorrow, but having at least a substantial subset of the Schedule 13D's and G's downloaded somewhere would be very handy for issue #62, as it would allow me to substantially speed up doing analysis on these forms. Do you have the filings that we had still sitting somewhere? Or should we just drop edgar.filing_docs_processed and start over?

iangow commented 4 years ago

I think we don't have it

I think edgar.filing_docs_processed has nothing to do with what's been downloaded (I may be wrong here), but instead represents the filings that have been looked at in producing edgar.filing_docs (some filings have issues and therefore won't produce output for that table, so keeping track in edgar.filing_docs_processed means that we don't keep trying to process these).

iangow commented 4 years ago

The process for "keeping track" of what's been downloaded was simple. If it's not there, download it. So no table needed. Checking what's there is very fast (orders of magnitude faster than downloading), so quite acceptable.

iangow commented 4 years ago

@bdcallen

Can this be closed?

bdcallen commented 4 years ago

@iangow Yes I think we can close this. We may need another issue to fix filing_docs_processed accordingly, as it contains rows for documents that were in the old directory.