mccgr / edgar

Code to manage data related to SEC EDGAR
31 stars 15 forks source link

Create code to extract item numbers from 8-Ks #10

Closed iangow closed 6 years ago

iangow commented 6 years ago

See here for an example.

library(curl)
t <- tempfile()

download.file("https://www.sec.gov/Archives/edgar/data/19617/000001961718000061/0000019617-18-000061.hdr.sgml", t)

read_html(t)
iangow commented 6 years ago

See here for some code that might be helpful.

jamespkav commented 6 years ago

Code completed and uploaded. Running on my server now and all seems to be ok (still running).

iangow commented 6 years ago

Deleted duplicates thus:

SET work_mem='10GB';

WITH dupes AS (
    SELECT file_name, item_no
    FROM edgar.item_no
    GROUP BY file_name, item_no
    HAVING count(*) > 1)

DELETE FROM edgar.item_no
WHERE file_name IN (SELECT file_name FROM dupes);
iangow commented 6 years ago

@jamespkav

Which items were you focusing on?

library(dplyr, warn.conflicts = FALSE)
library(RPostgreSQL)
#> Loading required package: DBI

pg <- dbConnect(PostgreSQL())

rs <- dbExecute(pg, "SET search_path TO edgar")
rs <- dbExecute(pg, "SET work_mem = '10GB'")

item_no <- tbl(pg, "item_no")
item_no_desc <- tbl(pg, "item_no_desc")

item_no %>%
    group_by(item_no) %>%
    count() %>%
    inner_join(item_no_desc) %>%
    arrange(desc(n)) %>%
    print(n = Inf)
#> Joining, by = "item_no"
#> # Source:     lazy query [?? x 3]
#> # Database:   postgres 9.6.8 [igow@10.101.13.99:5432/crsp]
#> # Groups:     item_no
#> # Ordered by: desc(n)
#>    item_no       n item_desc                                              
#>    <chr>     <dbl> <chr>                                                  
#>  1 9.01    813904. Financial Statements and Exhibits                      
#>  2 8.01    294336. Other Events                                           
#>  3 2.02    257421. Results of Operations and Financial Condition          
#>  4 1.01    198015. Entry into a Material Definitive Agreement             
#>  5 5.02    191907. Departure of Directors or Certain Officers; Election o…
#>  6 7.01    181299. Regulation FD Disclosure                               
#>  7 2.03     61601. Creation of a Direct Financial Obligation or an Obliga…
#>  8 3.02     44980. Unregistered Sales of Equity Securities                
#>  9 5.07     36373. Submission of Matters to a Vote of Security Holders    
#> 10 5.03     36268. Amendments to Articles of Incorporation or Bylaws; Cha…
#> 11 2.01     28790. Completion of Acquisition or Disposition of Assets     
#> 12 1.02     18327. Termination of a Material Definitive Agreement         
#> 13 4.01     17328. Changes in Registrant's Certifying Accountant          
#> 14 3.01     13935. Notice of Delisting or Failure to Satisfy a Continued …
#> 15 3.03     12169. Material Modification to Rights of Security Holders    
#> 16 5.01      9274. Changes in Control of Registrant                       
#> 17 4.02      5952. Non-Reliance on Previously Issued Financial Statements…
#> 18 2.05      5495. Costs Associated with Exit or Disposal Activities      
#> 19 2.04      3689. Triggering Events That Accelerate or Increase a Direct…
#> 20 2.06      3109. Material Impairments                                   
#> 21 1.03      2690. Bankruptcy or Receivership                             
#> 22 5.05      1917. Amendment to Registrant's Code of Ethics, or Waiver of…
#> 23 5.06      1724. Change in Shell Company Status                         
#> 24 5.04      1113. Temporary Suspension of Trading Under Registrant's Emp…
#> 25 6.02       705. Change of Servicer or Trustee                          
#> 26 5.08       339. Shareholder Director Nominations                       
#> 27 6.01       201. ABS Informational and Computational Material           
#> 28 1.04       173. Mine Safety - Reporting of Shutdowns and Patterns of V…
#> 29 6.05       108. Securities Act Updating Disclosure                     
#> 30 6.04        43. Failure to Make a Required Distribution                
#> 31 6.03        40. Change in Credit Enhancement or Other External Support

Created on 2018-04-22 by the reprex package (v0.2.0).

jamespkav commented 6 years ago

@iangow was working with Item 1.01, but all sorted. Less effective than planned but have variety of alternate options to capture the required data.

iangow commented 6 years ago

OK. Some of the other items look interesting. I hope to get Ben close to having the "partial mirror of EDGAR" code running soon.