Closed jamespkav closed 5 years ago
filing_docs <- tbl(pg, sql("SELECT * FROM edgar.filing_docs"))
> filing_docs_processed <- tbl(pg, sql("SELECT * FROM edgar.filing_docs_processed"))
> item8 <- tbl(pg, sql("SELECT DISTINCT file_name FROM edgar.item_no WHERE left(item_no, 1) = '8'"))
> filing_docs_to_get <- filing_docs %>% inner_join(item8, by = "file_name") %>% anti_join(filing_docs_processed, by = "file_name")
> filing_docs_to_get %>% filter(document %~*% "htm$") %>% count()
# Source: lazy query [?? x 1]
# Database: postgres 9.6.10 [bdcallen@/var/run/postgresql:5432/crsp]
n
<dbl>
1 0.
Done. I previously updated filing_docs
from the list of item 8 filings not yet in it, then downloaded the html documents, in the same way I did for item 5. It's just finished now.
It might be useful to document the steps taken to make this happen, starting from updating edgar.filings
through to downloading the .htm
files. The easiest way to do this for future tasks is to relate the commits to the associated issue.
Thanks Ben.