Closed bdcallen closed 4 years ago
@iangow I was going to raise this in the meeting tomorrow, but having at least a substantial subset of the Schedule 13D's and G's downloaded somewhere would be very handy for issue #62, as it would allow me to substantially speed up doing analysis on these forms. Do you have the filings that we had still sitting somewhere? Or should we just drop edgar.filing_docs_processed
and start over?
I think we don't have it
I think edgar.filing_docs_processed
has nothing to do with what's been downloaded (I may be wrong here), but instead represents the filings that have been looked at in producing edgar.filing_docs
(some filings have issues and therefore won't produce output for that table, so keeping track in edgar.filing_docs_processed
means that we don't keep trying to process these).
The process for "keeping track" of what's been downloaded was simple. If it's not there, download it. So no table needed. Checking what's there is very fast (orders of magnitude faster than downloading), so quite acceptable.
@bdcallen
Can this be closed?
@iangow Yes I think we can close this. We may need another issue to fix filing_docs_processed
accordingly, as it contains rows for documents that were in the old directory.
@iangow @jamespkav So previously, the documents downloaded from edgar were stored in the 2TB drive under data/edgar. But after the transfer of a lot of data to the 6TB drive, the data folder doesn't appear in the 2TB drive. Nor is there a data/edgar folder in 6TB (I had previously set up a data folder to hold the asxlisting documents). So this issue is for finding where the edgar documents are now stored.