mccgr / edgar

Code to manage data related to SEC EDGAR
31 stars 15 forks source link

Create CUSIP-CIK test table #93

Closed iangow closed 3 years ago

iangow commented 4 years ago

Table should implement fixes arising from issues such as #82, etc., so that incremental effect of fixes can be evaluated.

bdcallen commented 4 years ago

@iangow I've made the commit to make the change you mentioned at the end of #85 , and then ran the program. The counts below are before the running the program followed by after.

crsp=# SELECT COUNT(*) FROM edgar.cusip_cik_test;
  count
---------
 1353246
(1 row)

crsp=# SELECT COUNT(*) FROM edgar.cusip_cik_test;
  count
---------
 1335995
(1 row)

The difference in these numbers is equal to 17251, the number of rows in bad_cusips. Also,

crsp=# SELECT COUNT(*) FROM edgar.cusip_cik_test
WHERE LENGTH(cusip) = 9
AND RIGHT(cusip, 1) != CAST(check_digit AS CHARACTER);
 count
-------
     0
(1 row)

So the program has behaved as expected.

bdcallen commented 3 years ago

@iangow Can we close this issue? I guess the one remaining question is whether we want to have the main program amended to incorporate some of the changes done to make cusip_cik_test (ie. have the main program which makes edgar.cusip_cik delete bad 9-digits cusips straight away, for instance), or are we happy to stick with what we've got and filter out the bad apples later on with the kind of code we have in create_cusip_cik_test.R.

Let me know what you think. I think, provided you're happy with my answer to #80, #83, that this is the last question we should answer before I run extract_cusips.py again, and we bring the cusip cik project to a halt for now.