Closed iangow closed 6 years ago
Code completed and uploaded. Running on my server now and all seems to be ok (still running).
Deleted duplicates thus:
SET work_mem='10GB';
WITH dupes AS (
SELECT file_name, item_no
FROM edgar.item_no
GROUP BY file_name, item_no
HAVING count(*) > 1)
DELETE FROM edgar.item_no
WHERE file_name IN (SELECT file_name FROM dupes);
@jamespkav
Which items were you focusing on?
library(dplyr, warn.conflicts = FALSE)
library(RPostgreSQL)
#> Loading required package: DBI
pg <- dbConnect(PostgreSQL())
rs <- dbExecute(pg, "SET search_path TO edgar")
rs <- dbExecute(pg, "SET work_mem = '10GB'")
item_no <- tbl(pg, "item_no")
item_no_desc <- tbl(pg, "item_no_desc")
item_no %>%
group_by(item_no) %>%
count() %>%
inner_join(item_no_desc) %>%
arrange(desc(n)) %>%
print(n = Inf)
#> Joining, by = "item_no"
#> # Source: lazy query [?? x 3]
#> # Database: postgres 9.6.8 [igow@10.101.13.99:5432/crsp]
#> # Groups: item_no
#> # Ordered by: desc(n)
#> item_no n item_desc
#> <chr> <dbl> <chr>
#> 1 9.01 813904. Financial Statements and Exhibits
#> 2 8.01 294336. Other Events
#> 3 2.02 257421. Results of Operations and Financial Condition
#> 4 1.01 198015. Entry into a Material Definitive Agreement
#> 5 5.02 191907. Departure of Directors or Certain Officers; Election o…
#> 6 7.01 181299. Regulation FD Disclosure
#> 7 2.03 61601. Creation of a Direct Financial Obligation or an Obliga…
#> 8 3.02 44980. Unregistered Sales of Equity Securities
#> 9 5.07 36373. Submission of Matters to a Vote of Security Holders
#> 10 5.03 36268. Amendments to Articles of Incorporation or Bylaws; Cha…
#> 11 2.01 28790. Completion of Acquisition or Disposition of Assets
#> 12 1.02 18327. Termination of a Material Definitive Agreement
#> 13 4.01 17328. Changes in Registrant's Certifying Accountant
#> 14 3.01 13935. Notice of Delisting or Failure to Satisfy a Continued …
#> 15 3.03 12169. Material Modification to Rights of Security Holders
#> 16 5.01 9274. Changes in Control of Registrant
#> 17 4.02 5952. Non-Reliance on Previously Issued Financial Statements…
#> 18 2.05 5495. Costs Associated with Exit or Disposal Activities
#> 19 2.04 3689. Triggering Events That Accelerate or Increase a Direct…
#> 20 2.06 3109. Material Impairments
#> 21 1.03 2690. Bankruptcy or Receivership
#> 22 5.05 1917. Amendment to Registrant's Code of Ethics, or Waiver of…
#> 23 5.06 1724. Change in Shell Company Status
#> 24 5.04 1113. Temporary Suspension of Trading Under Registrant's Emp…
#> 25 6.02 705. Change of Servicer or Trustee
#> 26 5.08 339. Shareholder Director Nominations
#> 27 6.01 201. ABS Informational and Computational Material
#> 28 1.04 173. Mine Safety - Reporting of Shutdowns and Patterns of V…
#> 29 6.05 108. Securities Act Updating Disclosure
#> 30 6.04 43. Failure to Make a Required Distribution
#> 31 6.03 40. Change in Credit Enhancement or Other External Support
Created on 2018-04-22 by the reprex package (v0.2.0).
@iangow was working with Item 1.01, but all sorted. Less effective than planned but have variety of alternate options to capture the required data.
OK. Some of the other items look interesting. I hope to get Ben close to having the "partial mirror of EDGAR" code running soon.
See here for an example.