download item 8 - Githubissues

jamespkav commented 5 years ago

Thanks Ben.

bdcallen commented 5 years ago

filing_docs  <- tbl(pg, sql("SELECT * FROM edgar.filing_docs"))
>     filing_docs_processed <- tbl(pg, sql("SELECT * FROM edgar.filing_docs_processed"))
>     item8 <- tbl(pg, sql("SELECT DISTINCT file_name FROM edgar.item_no WHERE left(item_no, 1) = '8'"))
> filing_docs_to_get <- filing_docs %>% inner_join(item8, by = "file_name") %>% anti_join(filing_docs_processed, by = "file_name")
> filing_docs_to_get %>% filter(document %~*% "htm$") %>% count()
# Source:   lazy query [?? x 1]
# Database: postgres 9.6.10 [bdcallen@/var/run/postgresql:5432/crsp]
      n
  <dbl>
1    0.

Done. I previously updated filing_docs from the list of item 8 filings not yet in it, then downloaded the html documents, in the same way I did for item 5. It's just finished now.

iangow commented 5 years ago

It might be useful to document the steps taken to make this happen, starting from updating edgar.filings through to downloading the .htm files. The easiest way to do this for future tasks is to relate the commits to the associated issue.

mccgr / edgar

download item 8 #34