Closed bdcallen closed 4 years ago
> cusip_cik %>% filter(cusip_length == 8) %>% distinct(cik, cusip) %>% count()
# Source: lazy query [?? x 1]
# Database: postgres 11.0.7 [bdcallen@/var/run/postgresql:5432/crsp]
n
<dbl>
1 6156
> cusip_cik %>% filter(cusip_length == 8) %>% inner_join(cusip_cik, by = c('file_name', 'cik')) %>%
> filter(cusip_length.y == 9) %>% distinct(cik, cusip.x) %>% count()
# Source: lazy query [?? x 1]
# Database: postgres 11.0.7 [bdcallen@/var/run/postgresql:5432/crsp]
n
<dbl>
1 2302
> cusip_cik %>% filter(cusip_length == 8) %>%
> inner_join(cusip_cik, by = c('file_name', 'cik', 'cusip6')) %>%
> filter(cusip_length.y == 9) %>% distinct(cik, cusip.x) %>% count()
# Source: lazy query [?? x 1]
# Database: postgres 11.0.7 [bdcallen@/var/run/postgresql:5432/crsp]
n
<dbl>
1 880
@bdcallen
I think what you want to do is create data frame from the 9-digit CUSIPs, then drop the last digit from those CUSIPs. Then ask
In looking at 7-digit CUSIPs, I think that (pretty much) the only way that they would be valid would be if they were 8-digit CUSIPs with a leading "0" dropped. So I think for those, you'd repeat the analysis above, but converting both 8- and 9-digit CUSIPs into seven-digit CUSIPs.
It seems here that you are looking at 8- and 9-digit CUSIPs in the same filing. I think we already have an issue looking at whether we should only take the 9-digit CUSIP from a filing when valid. Do that issue before this one.
@iangow Sorry, from one of your last comments, I assumed you had put comments on silent for yourself, leaving it to message to you in your inbox once I had used '@iangow'. Hence, the undeveloped nature of my comment above (I was committing stuff to edit later). But, yes, I'm working towards something of the nature you describe in your response.
It seems here that you are looking at 8- and 9-digit CUSIPs in the same filing. I think we already have an issue looking at whether we should only take the 9-digit CUSIP from a filing when valid. Do that issue before this one.
Yes, I'll redo the analysis joining only on cik. I agree that will probably give us a better idea of the value of the lower digit cusips
@iangow
> cusip_cik <- cusip_cik %>% mutate(cusip8 = substr(cusip, 1, 8))
> cusip_cik %>% filter(cusip_length == 8) %>%
> inner_join(cusip_cik, by = 'cik') %>%
> filter(cusip_length.y == 9) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 5896
> cusip_cik %>% filter(cusip_length == 8) %>% distinct(cik, cusip) %>% count()
# A tibble: 1 x 1
n
<int>
1 6156
> cusip_cik %>% filter(cusip_length == 8) %>%
> inner_join(cusip_cik, by = 'cik') %>%
> filter(cusip_length.y == 9) %>% filter(cusip6.x == cusip6.y) %>%
> distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 3540
> cusip_cik %>% filter(cusip_length == 8) %>%
> inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>%
> filter(cusip8.x == cusip8.y) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 3168
So this does give a better idea. It seems just over half of cases, the 8 digit cusip matches the cusip8 of a 9 digit cusip, at least in some filing.
@iangow I think the next step is to see how many match with other digits omitted by mistake (which I suspect is also a common problem)
No problem. I think I see messages here because I am "participating" in this issue. It's good if you're heading that way already.
@iangow
> cusip_cik <- cusip_cik %>% mutate(cusip7_1 = paste0(substr(cusip, 1, 7), substr(cusip, 9, 9)))
> cusip_cik <- cusip_cik %>% mutate(cusip6_2 = paste0(substr(cusip, 1, 6), substr(cusip, 8, 9)))
> cusip_cik <- cusip_cik %>% mutate(cusip5_3 = paste0(substr(cusip, 1, 5), substr(cusip, 7, 9)))
> cusip_cik <- cusip_cik %>% mutate(cusip4_4 = paste0(substr(cusip, 1, 4), substr(cusip, 6, 9)))
> cusip_cik <- cusip_cik %>% mutate(cusip3_5 = paste0(substr(cusip, 1, 3), substr(cusip, 5, 9)))
> cusip_cik <- cusip_cik %>% mutate(cusip2_6 = paste0(substr(cusip, 1, 2), substr(cusip, 4, 9)))
> cusip_cik <- cusip_cik %>% mutate(cusip1_7 = paste0(substr(cusip, 1, 1), substr(cusip, 3, 9)))
> cusip_cik %>% filter(cusip_length == 8) %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip7_1.y == cusip.x) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 537
> cusip_cik %>% filter(cusip_length == 8) %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip6_2.y == cusip.x) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 193
> cusip_cik %>% filter(cusip_length == 8) %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip5_3.y == cusip.x) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 251
> cusip_cik %>% filter(cusip_length == 8) %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip4_4.y == cusip.x) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 415
> cusip_cik %>% filter(cusip_length == 8) %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip3_5.y == cusip.x) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 323
> cusip_cik %>% filter(cusip_length == 8) %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip2_6.y == cusip.x) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 225
> cusip_cik %>% filter(cusip_length == 8) %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip1_7.y == cusip.x) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 150
> cusip_cik %>% filter(cusip_length == 8) %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip0_8.y == cusip.x) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 338
> cusip_cik %>% filter(cusip_length == 8) %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip8.y == cusip.x | cusip7_1.y == cusip.x | cusip6_2.y == cusip.x | cusip5_3.y == cusip.x | cusip4_4.y == cusip.x | cusip3_5.y == cusip.x | cusip2_6.y == cusip.x | cusip1_7.y == cusip.x | cusip0_8.y == cusip.x) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 4662
> cusip_cik %>% filter(cusip_length == 8) %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 5896
So 6156 cases of 8 digit cusips, 5896 of them join to 9 digit cusips under joins just under cik. Then doing the other possible 8 digit subsets of the 9 digit cusips, 4662 have matches out of those 5896. So it seems around 80% of cases where an 8 digit cusip is associated with a 9 digit one under joins under cik, there is in fact some digit missing (mostly the last digit, as we thought, but a substantial amount of time it's other digits)
If 5,696 of the 8-digit CUSIP-CIK combinations can be found as (equivalent) 9-digit CUSIP-CIK combinations, then the only value of using 8-digit CUSIP-CIK combinations is the 260 incremental cases (6,156 - 5,896). So you should check some of those … if they look good, then I think we should keep them. (Note that if the 5,896 includes joins on digits 2 through 9 [the only other possibility for an eight-digit CUSIP], then it's not a "valid eight-digit CUSIP" and we don't want this. A "valid eight-digit CUSIP" is one that is substr(cusip9, 1, 8)
for some valid cusip9
.)
Then, consider 7-digit CUSIPs relative to the combined 8- and 9-digit CUSIP-CIK combinations in the same way.
> unique_8_df <- cusip_cik %>% filter(cusip_length == 8) %>%
> anti_join(cusip_cik %>% filter(cusip_length == 8) %>%
> inner_join(cusip_cik, by = 'cik') %>%
> filter(cusip_length.y == 9) %>% distinct(cik, cusip.x) %>%
> mutate(cusip = cusip.x), by = c('cik', 'cusip')) %>%
> distinct(cik, cusip) %>% mutate(cusip6 = substr(cusip, 1, 6))
> unique_8_df
# A tibble: 260 x 3
cik cusip cusip6
<int> <chr> <chr>
1 1038370 8716Y107 8716Y1
2 902528 03215109 032151
3 1068104 67588103 675881
4 18568 2049H109 2049H1
5 18568 15242510 152425
6 764763 89585105 895851
7 729213 74791410 747914
8 903954 64092410 640924
9 903954 74726X10 74726X
10 903954 74731K10 74731K
# … with 250 more rows
> unique_8_df %>% filter(cusip == 'SCHEDULE') %>% count()
# A tibble: 1 x 1
n
<int>
1 15
> unique_8_df <- unique_8_df %>% filter(cusip != 'SCHEDULE')
> unique_8_df
# A tibble: 245 x 3
cik cusip cusip6
<int> <chr> <chr>
1 1038370 8716Y107 8716Y1
2 902528 03215109 032151
3 1068104 67588103 675881
4 18568 2049H109 2049H1
5 18568 15242510 152425
6 764763 89585105 895851
7 729213 74791410 747914
8 903954 64092410 640924
9 903954 74726X10 74726X
10 903954 74731K10 74731K
# … with 235 more rows
> unique_8_df %>% group_by(cik) %>%
> summarise(num = n()) %>% filter(num > 1)
# A tibble: 9 x 2
cik num
<int> <int>
1 18568 2
2 50361 2
3 353718 3
4 764763 2
5 903954 72
6 904081 2
7 1002819 2
8 1006262 2
9 1093367 3
> unique_8_df %>% distinct(cik, cusip6) %>%
> group_by(cik) %>%
> summarise(num = n()) %>% filter(num > 1)
# A tibble: 8 x 2
cik num
<int> <int>
1 18568 2
2 50361 2
3 353718 3
4 764763 2
5 903954 72
6 904081 2
7 1006262 2
8 1093367 3
> unique_8_df %>% filter(cik == 903954)
# A tibble: 72 x 3
cik cusip cusip6
<int> <chr> <chr>
1 903954 64092410 640924
2 903954 74726X10 74726X
3 903954 74731K10 74731K
4 903954 85254C10 85254C
5 903954 89341610 893416
6 903954 89674L10 89674L
7 903954 91383710 913837
8 903954 75190730 751907
9 903954 61744634 617446
10 903954 00253910 002539
# … with 62 more rows
> unique_8_df %>% mutate(ncusip = cusip) %>%
> inner_join(stocknames, by = 'ncusip') %>%
> distinct(cik, ncusip) %>% count()
# A tibble: 1 x 1
n
<int>
1 158
> unique_8_df %>% mutate(ncusip = cusip) %>%
> inner_join(stocknames, by = 'ncusip') %>%
> distinct(cik, ncusip) %>%
> filter(cik != 903954) %>% count()
# A tibble: 1 x 1
n
<int>
1 93
> ciks %>% inner_join(unique_8_df, by = 'cik') %>%
> inner_join(issuers, by = 'cusip6') %>%
> filter(company_name == issuer_name_1) %>%
> distinct(cik, cusip) %>% count()
# A tibble: 1 x 1
n
<int>
1 46
> unique_8_df %>% distinct(cik, cusip) %>% count()
# A tibble: 1 x 1
n
<int>
1 245
> ciks %>% inner_join(unique_8_df, by = 'cik') %>%
> inner_join(issuers, by = 'cusip6') %>%
> distinct(cik, cusip) %>%
> filter(cik != 903954) %>% count()
# A tibble: 1 x 1
n
<int>
1 135
> probs <- ciks %>% inner_join(unique_8_df, by = 'cik') %>%
> inner_join(issuers, by = 'cusip6') %>% distinct(cik, cusip) %>%
> filter(cik != 903954) %>% arrange(cik) %>%
> group_by(cik) %>% summarise(num = n()) %>%
> filter(num > 1)
> probs
# A tibble: 5 x 2
cik num
<int> <int>
1 353718 3
2 904081 2
3 1002819 2
4 1006262 2
5 1093367 2
> ciks %>% inner_join(unique_8_df, by = 'cik') %>%
> inner_join(issuers, by = 'cusip6') %>%
> inner_join(probs, by = 'cik') %>%
> select(cik, cusip, company_name, issuer_name_1)
# A tibble: 16 x 4
cik cusip company_name issuer_name_1
<int> <chr> <chr> <chr>
1 1002819 00891110 AIR CANADA /QUEBEC/ AIR CDA
2 1002819 00891130 AIR CANADA /QUEBEC/ AIR CDA
3 1093367 02932810 AMERICAN RIVERS OIL CO /DE/ AMERICAN RIVS OIL CO
4 1093367 02932010 AMERICAN RIVERS OIL CO /DE/ AMERICAN RICE INC DEL
5 1093367 02932810 AROC INC AMERICAN RIVS OIL CO
6 1093367 02932010 AROC INC AMERICAN RICE INC DEL
7 353718 48563610 CARL KARCHER ENTERPRISES INC KARCHER CARL ENTERPRISES INC
8 353718 19121910 CARL KARCHER ENTERPRISES INC COCA COLA ENTERPRISES INC
9 353718 19104210 CARL KARCHER ENTERPRISES INC COBRA ELECTRS CORP
10 353718 48563610 KARCHER CARL ENTERPRISES INC KARCHER CARL ENTERPRISES INC
11 353718 19121910 KARCHER CARL ENTERPRISES INC COCA COLA ENTERPRISES INC
12 353718 19104210 KARCHER CARL ENTERPRISES INC COBRA ELECTRS CORP
13 1006262 48625010 PROMIS SYSTEMS CORP LTD KAUFMANN FD INC
14 1006262 74341V10 PROMIS SYSTEMS CORP LTD PROMIS SYS LTD
15 904081 63577110 STECK VAUGHN PUBLISHING CORP NATIONAL ED CORP
16 904081 85804510 STECK VAUGHN PUBLISHING CORP STECK-VAUGHN PUBG CORP
> unique_8_df %>% distinct(cik, cusip6) %>%
> group_by(cik) %>% summarise(num = n()) %>% filter(num == 1) %>%
> inner_join(unique_8_df) %>%
> inner_join(issuers, by = 'cusip6') %>% inner_join(ciks) %>%
> select(cik, cusip, cusip6, issuer_name_1, company_name) %>%
> arrange(cik)
Joining, by = "cik"
Joining, by = "cik"
# A tibble: 148 x 5
cik cusip cusip6 issuer_name_1 company_name
<int> <chr> <chr> <chr> <chr>
1 13006 17253C10 17253C CIRCA PHARMACEUTICALS INC CIRCA PHARMACEUTICALS INC
2 14177 10876310 108763 BRIDGFORD FOODS CORP BRIDGFORD FOODS CORP
3 22896 20482G10 20482G COMPUSERVE CORP DEL COMPUSERVE CORP
4 28209 24955510 249555 DEPOSIT GTY CORP DEPOSIT GUARANTY CORP
5 39838 36442410 364424 GALVESTON HOUSTON CO GALVESTON HOUSTON CO
6 40675 37017210 370172 GENERAL KINETICS INC GENERAL KINETICS INC
7 48544 44090300 440903 HOSPLEX SYS INC HOSOI GARDEN MORTUARY INC
8 50253 45583010 455830 INDUSTRIAL ACOUSTICS INC INDUSTRIAL ACOUSTICS CO INC
9 60549 54667680 546676 LOUISVILLE GAS & ELEC CO LOUISVILLE GAS & ELECTRIC CO /KY/
10 65297 05873J10 05873J BALLYS GRAND INC BALLYS GRAND INC /DE/
# … with 138 more rows
> View(unique_8_df %>% distinct(cik, cusip6) %>%
> group_by(cik) %>% summarise(num = n()) %>% filter(num == 1) %>%
> inner_join(unique_8_df) %>%
> inner_join(issuers, by = 'cusip6') %>% inner_join(ciks) %>%
> select(cik, cusip, cusip6, issuer_name_1, company_name) %>%
> arrange(cik))
cik | cusip | cusip6 | issuer_name_1 | company_name | |
---|---|---|---|---|---|
1 | 13006 | 17253C10 | 17253C | CIRCA PHARMACEUTICALS INC | CIRCA PHARMACEUTICALS INC |
2 | 14177 | 10876310 | 108763 | BRIDGFORD FOODS CORP | BRIDGFORD FOODS CORP |
3 | 22896 | 20482G10 | 20482G | COMPUSERVE CORP DEL | COMPUSERVE CORP |
4 | 28209 | 24955510 | 249555 | DEPOSIT GTY CORP | DEPOSIT GUARANTY CORP |
5 | 39838 | 36442410 | 364424 | GALVESTON HOUSTON CO | GALVESTON HOUSTON CO |
6 | 40675 | 37017210 | 370172 | GENERAL KINETICS INC | GENERAL KINETICS INC |
7 | 48544 | 44090300 | 440903 | HOSPLEX SYS INC | HOSOI GARDEN MORTUARY INC |
8 | 50253 | 45583010 | 455830 | INDUSTRIAL ACOUSTICS INC | INDUSTRIAL ACOUSTICS CO INC |
9 | 60549 | 54667680 | 546676 | LOUISVILLE GAS & ELEC CO | LOUISVILLE GAS & ELECTRIC CO /KY/ |
10 | 65297 | 05873J10 | 05873J | BALLYS GRAND INC | BALLYS GRAND INC /DE/ |
11 | 74154 | 68267810 | 682678 | ONEOK INC | ONEOK INC |
12 | 75748 | 69562410 | 695624 | PAIN SUPPRESSION LABS INC | PAIN SUPPRESSION LABS, INC. |
13 | 81301 | 74714210 | 747142 | PYRAMID COMMUNICATIONS INC NEW | PYRAMID COMMUNICATIONS INC |
14 | 97483 | 71891320 | 718913 | PHOENIX RESOURCE COS INC | PHOENIX RESOURCE COMPANIES INC |
15 | 109870 | 37722620 | 377226 | GLASSMASTER CO | GLASSMASTER CO |
16 | 225998 | 93881510 | 938815 | WASHINGTON ENERGY CO | WASHINGTON ENERGY CO |
17 | 276886 | 53191120 | 531911 | LIFETIME CORP | LIFETIME CORP |
18 | 316537 | 05384310 | 053843 | AVOCA INC | AVOCA INC |
19 | 350305 | 83438010 | 834380 | SOLV EX CORP | SOLV EX CORP |
20 | 350926 | 71365810 | 713658 | PEREGRINE ENTMT LTD | PEREGRINE ENTERTAINMENT LTD |
21 | 350926 | 71365810 | 713658 | PEREGRINE ENTMT LTD | PEREGRINE ENTERTAINMENT |
22 | 354827 | 23790310 | 237903 | DATA MEASUREMENT CORP | DATA MEASUREMENT CORP |
23 | 357262 | 65881000 | 658810 | NORTH CNTY BANCORP | NORTH COUNTY BANCORP |
24 | 712468 | 25059510 | 250595 | DESOTO INC | NARRAGANSETT FIRST FUND |
25 | 712744 | 25390210 | 253902 | DIGITAL COMMUNICATIONS ASSOC | DIGITAL COMMUNICATIONS ASSOCIATES INC |
26 | 717192 | 58977410 | 589774 | MERIDIAN NATL CORP | MERIDIAN NATIONAL CORP |
27 | 719264 | 31946210 | 319462 | FIRST CTZNS BANCSHARES INC | FIRST CITIZENS BANCSHARES INC /TN/ |
28 | 725014 | 75449810 | 754498 | RAWSON-KOENIG INC | RAWSON KOENIG INC |
29 | 727127 | 78403010 | 784030 | SCS COMPUTE INC | SCS COMPUTE INC |
30 | 729213 | 74791410 | 747914 | Q MED INC | Q MED INC |
31 | 729213 | 74791410 | 747914 | Q MED INC | QMED INC |
32 | 737033 | 74835710 | 748357 | QUESTECH INC | QUESTECH INC |
33 | 740074 | 55349510 | 553495 | MSA RLTY CORP | MSA REALTY CORP |
34 | 761651 | 37248610 | 372486 | GEORGE MASON BANKSHARES INC | MASON GEORGE BANKSHARES INC |
35 | 768863 | 88254510 | 882545 | TEXAS MERIDIAN RES CORP | TEXAS MERIDIAN RESOURCES LTD |
36 | 777844 | 20992910 | 209929 | COMPUSONICS VIDEO CORP | COMPUSONICS VIDEO CORP |
37 | 789577 | 89324510 | 893245 | TRANS LEASING INTL INC | TLI LIQUIDATING CORP |
38 | 789577 | 89324510 | 893245 | TRANS LEASING INTL INC | TRANS LEASING INTERNATIONAL INC |
39 | 789987 | 42990810 | 429908 | HIGH CASH PARTNERS LP | RESOURCES PENSION SHARES 5 LP |
40 | 790025 | 14030810 | 140308 | CAPITAL MEDIA GROUP LTD | CAPITAL MEDIA GROUP LTD |
41 | 791350 | 40641710 | 406417 | HALSTEAD ENERGY CORP | HALSTEAD ENERGY CORP |
42 | 794984 | 42990810 | 429908 | HIGH CASH PARTNERS LP | HIGH CASH PARTNERS L P |
43 | 799723 | 37936K00 | 37936K | GLOBAL WRESTLING ALLIANCE INC | BIONET TECHNOLOGIES INC |
44 | 799723 | 37936K00 | 37936K | GLOBAL WRESTLING ALLIANCE INC | PRATT WYLCE & LORDS LTD |
45 | 801121 | 45753810 | 457538 | INMAC CORP | INMAC CORP |
46 | 803058 | 89151110 | 891511 | TOTAL RESH CORP | TOTAL RESEARCH CORP |
47 | 808575 | 71913203 | 719132 | PHOENIX RESTAURANT GROUP INC | HUGHES RESOURCES INC |
48 | 808575 | 71913203 | 719132 | PHOENIX RESTAURANT GROUP INC | PHOENIX RESOURCES TECHNOLOGIES INC |
49 | 810578 | 79284810 | 792848 | ST PAUL BANCORP INC | ST PAUL BANCORP INC |
50 | 810995 | 01880500 | 018805 | ALLIANZ SOCIETAS EUROPAEA-SE | ALLIED BANKSHARES INC |
51 | 811213 | 35834019 | 358340 | FREYMILLER TRUCKING INC | FREYMILLER TRUCKING INC |
52 | 814479 | 39682910 | 396829 | GREENWICH FINL CORP | GREENWICH FINANCIAL CORP |
53 | 825288 | 68219P10 | 68219P | ON TECHNOLOGY CORP | INSURED MUNICIPALS INCOME TRUST SERIES 200 |
54 | 828360 | 82539610 | 825396 | SHOWPOWER INC | SHOWPOWER INC |
55 | 837028 | 87236310 | 872363 | TDX CORP | TDX CORP |
56 | 842180 | 05959110 | 059591 | BANCO FRANCES DEL RIO DE LA | BANCO BILBAO VIZCAYA ARGENTARIA S A |
57 | 842180 | 05959110 | 059591 | BANCO FRANCES DEL RIO DE LA | BANCO BILBAO VIZCAYA ARGENTARIA, S.A. |
58 | 843090 | 03253500 | 032535 | ANAGRAM PLUS INC | ANANGEL AMERICAN SHIPHOLDINGS LTD |
59 | 846046 | 85232N10 | 85232N | STACKPOLE LTD | STACKPOLE CORP |
60 | 848465 | 39015810 | 390158 | GREAT BEAR TECHNOLOGY INC | STARPRESS INC |
61 | 849314 | 08179540 | 081795 | BENETTON GROUP SPA | BENETTON GROUP SPA |
62 | 853927 | 02744L10 | 02744L | AMERICAN MEDIA INC | AMERICAN MEDIA OPERATIONS INC |
63 | 859632 | 90249910 | 902499 | TYSONS FINL CORP | TYSONS FINANCIAL CORP |
64 | 862450 | 03922910 | 039229 | ARCADIAN CORP | ARCADIAN CORP |
65 | 863924 | 19248110 | 192481 | COHO ENERGY INC | COHO RESOURCES INC |
66 | 868568 | 55920000 | 559200 | MAGNA BANCORP INC | MAGNA BANCORP INC |
67 | 874758 | 14790410 | 147904 | CASH CAN INC | CASH CAN INC |
68 | 875174 | 59373610 | 593736 | MIAMI SUBS CORP | MIAMI SUBS CORP |
69 | 877210 | 86431310 | 864313 | SUBMICRON SYS CORP | SUBMICRON SYSTEMS CORP |
70 | 877984 | 09161R10 | 09161R | BIOMIRA INC | BIOMIRA INC |
71 | 881443 | 97374210 | 973742 | WINDSOR REAL ESTATE INVT TR 8 | N TANDEM TRUST |
72 | 881443 | 97374210 | 973742 | WINDSOR REAL ESTATE INVT TR 8 | WINDSOR REAL ESTATE INVESTMENT TRUST 8 |
73 | 882264 | 03852L10 | 03852L | ARAKIS ENERGY CORP | ARAKIS ENERGY CORP |
74 | 882321 | 35168810 | 351688 | FRAME TECHNOLOGY INC | FRAME TECHNOLOGY CORP/CA |
75 | 887315 | 42209410 | 422094 | HEADSTRONG GROUP INC | HEADSTRONG GROUP INC |
76 | 894541 | 08659410 | 086594 | BETA WELL SVC INC | BETA WELL SERVICE INC |
77 | 897074 | 86422210 | 864222 | STYLES ON VIDEO INC | STYLES ON VIDEO INC |
78 | 897878 | 00371480 | 003714 | ABN AMRO NORTH AMER INC | ABN AMRO BANK NV |
79 | 897878 | 00371480 | 003714 | ABN AMRO NORTH AMER INC | ROYAL BANK OF SCOTLAND N.V. |
80 | 897910 | 61735710 | 617357 | MORGAN GRENFELL SMALLCAP FD | MORGAN GRENFELL CAPITAL MANAGEMENT |
81 | 900385 | 15043810 | 150438 | CEDAR INCOME FD LTD | AEGON USA INC |
82 | 904900 | 08372L10 | 08372L | BERG ELECTRS CORP | BERG ELECTRONICS CORP /DE/ |
83 | 906391 | 12628810 | 126288 | CSB FINL CORP | CSB FINANCIAL CORP |
84 | 909723 | 44159410 | 441594 | HOULIHANS RESTAURANT GROUP INC | HOULIHANS RESTAURANT GROUP INC |
85 | 914344 | 05873110 | 058731 | BALLY GAMING INTL INC | BALLY GAMING INC |
86 | 916536 | 40862310 | 408623 | HAMMONS JOHN Q HOTELS INC | HAMMONS JOHN Q HOTELS LP |
87 | 916537 | 40862310 | 408623 | HAMMONS JOHN Q HOTELS INC | HAMMONS JOHN Q HOTELS FINANCE CORP |
88 | 922738 | 86330P10 | 86330P | STRESSGEN BIOTECHNOLOGIES | STRESSGEN BIOTECHNOLOGIES CORP /FI |
89 | 924835 | 88871110 | 888711 | TIVOLI INDS INC | TIVOLI INDUSTRIES INC |
90 | 927027 | 02359100 | 023591 | AMERCO INC | SHOEN JAMES P |
91 | 927040 | 48203317 | 482033 | JUNO ACQUISITIONS INC | AREMISSOFT CORP |
92 | 927040 | 48203317 | 482033 | JUNO ACQUISITIONS INC | JUNO ACQUISITIONS INC |
93 | 928507 | 12685910 | 126859 | CABLEMAXX INC | CABLEMAXX INC /DE |
94 | 928539 | 64107410 | 641074 | NESTOR INC | WAND NESTOR INVESTMENTS L P ET AL |
95 | 929757 | 87259109 | 872591 | TNBANK OAK RIDGE TENN | T NETIX INC |
96 | 933425 | 46530610 | 465306 | ITALIAN OVEN INC | ACCREDITED BUSINESS CONSOLIDATORS CORP. |
97 | 933425 | 46530610 | 465306 | ITALIAN OVEN INC | ITALIAN OVEN INC |
98 | 933425 | 46530610 | 465306 | ITALIAN OVEN INC | STORE LIQUIDATION CO INC |
99 | 943230 | 89816810 | 898168 | TRUMP HOTELS CASINO RESORTS | TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 |
100 | 948753 | 69367410 | 693674 | PXRE CORP | PXRE CORP /FA/ |
101 | 1002819 | 00891110 | 008911 | AIR CDA | AIR CANADA /QUEBEC/ |
102 | 1002819 | 00891130 | 008911 | AIR CDA | AIR CANADA /QUEBEC/ |
103 | 1004278 | 35849710 | 358497 | FRIENDLY ICE CREAM CORP NEW | KEYSTONE INSTITUTIONAL TRUST |
104 | 1010136 | 45192110 | 451921 | IMAGEMATRIX CORP | IMAGEMATRIX CORP |
105 | 1010312 | 68267810 | 682678 | ONEOK INC | WESTAR CAPITAL INC |
106 | 1010312 | 68267810 | 682678 | ONEOK INC | WESTAR INDUSTRIES INC |
107 | 1013050 | 74257410 | 742574 | PRINTRAK INTL INC | PRINTRAK INTERNATIONAL INC |
108 | 1021255 | 74037Q10 | 74037Q | PREFERRED INCOME MGMT FD INC | HOREJSI ENTERPRISES INC |
109 | 1021255 | 74037Q10 | 74037Q | PREFERRED INCOME MGMT FD INC | HOREJSI INC |
110 | 1022097 | 89990510 | 899905 | TURBODYNE TECHNOLOGIES INC | TURBODYNE TECHNOLGIES INC |
111 | 1022097 | 89990510 | 899905 | TURBODYNE TECHNOLOGIES INC | TURBODYNE TECHNOLOGIES INC |
112 | 1022097 | 89990510 | 899905 | TURBODYNE TECHNOLOGIES INC | TURBODYNE TECHNOLOGIES, INC |
113 | 1030614 | 98901220 | 989012 | ZANE INTERACTIVE PUBG INC | ZANE INTERACTIVE PUBLISHING INC |
114 | 1030814 | 90345610 | 903456 | UBARTER COM INC | INTERNATIONAL BARTER CORP |
115 | 1030814 | 90345610 | 903456 | UBARTER COM INC | UBARTER COM INC |
116 | 1035028 | 62907710 | 629077 | NEI WEBWORLD INC | NEI WEBWORLD INC |
117 | 1040476 | 29439810 | 294398 | EQUALITY BANCORP INC | EQUALITY BANCORP INC |
118 | 1040649 | 74871810 | 748718 | QUINENCO S A | QUINENCO SA |
119 | 1041970 | 02914P20 | 02914P | AMERICAN QUANTUM CYCLES INC | AMERICAN QUANTUM CYCLES INC |
120 | 1044278 | 35849710 | 358497 | FRIENDLY ICE CREAM CORP NEW | FRIENDLYS RESTAURANTS FRANCHISE INC |
121 | 1055037 | 72345109 | 723451 | PINNACLE BK JASPER ALA | PINNACLE BUSINESS MANAGEMENT INC |
122 | 1057437 | 30267410 | 302674 | FVNB CORP | FVNB CORP |
123 | 1062023 | 60753310 | 607533 | MODEM MEDIA INC | OPTIMARK HOLDINGS INC |
124 | 1062023 | 60753310 | 607533 | MODEM MEDIA INC | OPTIMARK TECHNOLOGIES INC |
125 | 1064133 | 98144Q10 | 98144Q | WORLD COMM ONLINE INC DEL | WORLD COMMERCE ONLINE INC |
126 | 1064990 | 86330P10 | 86330P | STRESSGEN BIOTECHNOLOGIES | STRESSGEN BIOTECHNOLOGIES CORP |
127 | 1066114 | 87923P10 | 87923P | TELE CENTRO OESTE CELULAR | TELE CENTRO DESTE CELULAR PARTICIPACOES |
128 | 1066114 | 87923P10 | 87923P | TELE CENTRO OESTE CELULAR | TELE CENTRO OESTE CELULAR PARTICIPACOES |
129 | 1071355 | 74046M30 | 74046M | PREMIER BRANDS INC | CATHAYONE INC |
130 | 1071355 | 74046M30 | 74046M | PREMIER BRANDS INC | PREMIER BRANDS INC/UT |
131 | 1075710 | 15691510 | 156915 | CERTICOM CORP | CERTICOM CORP |
132 | 1101734 | 62885310 | 628853 | NCL HLDG ASA | ARRASAS LTD |
133 | 1106946 | 19419D10 | 19419D | COLLECTIBLE CONCEPTS GROUP INC | COLLECTIBLE CONCEPTS GROUP INC |
134 | 1110072 | 57951710 | 579517 | MCCLENDON TRANSN GROUP INC | MCCLENDON TRANSPORTATION GROUP INC |
135 | 1113227 | 90337P10 | 90337P | US WATS INC | CAPSULE COMMUNICATIONS INC DE |
136 | 1113227 | 90337P10 | 90337P | US WATS INC | CAPSULE COMMUNICATIONS INC |
137 | 1120802 | 36298R10 | 36298R | GS AGRIFUELS CORP | GS AGRIFUELS CORP |
138 | 1120802 | 36298R10 | 36298R | GS AGRIFUELS CORP | HUGO INTERNATIONAL TELECOM INC |
139 | 1200598 | 87927W10 | 87927W | TELECOM ITALIA SPA | TELECOM ITALIA SPA |
140 | 1218267 | 92390110 | 923901 | VERMILION RES LTD | VERMILION RESOURCES LTD |
141 | 1218991 | 97263M10 | 97263M | WIMM-BILL-DANN FOODS OJSC | YUSHVAEV GAVRIL A |
142 | 1322387 | 58818320 | 588183 | MERCHANDISE CREATIONS INC | INTELLIGENTIAS, INC. |
143 | 1322387 | 58818320 | 588183 | MERCHANDISE CREATIONS INC | MERCHANDISE CREATIONS, INC. |
144 | 1347004 | 62847L10 | 62847L | MYECHECK INC | MYECHECK INC. |
145 | 1347004 | 62847L10 | 62847L | MYECHECK INC | SEKOYA HOLDINGS LTD. |
146 | 1374976 | 66651104 | 666511 | NORTHLAND PWR INC | NORTHPORT CAPITAL INC. |
147 | 1374976 | 66651104 | 666511 | NORTHLAND PWR INC | NORTHPORT NETWORK SYSTEMS, INC. |
148 | 1399815 | 66986W10 | 66986W | NOVAMED INC DEL | NOVAMED EYECARE RESEARCH, INC |
> View(unique_8_df %>% distinct(cik, cusip6) %>% group_by(cik) %>%
> summarise(num = n()) %>% filter(num == 1) %>%
> inner_join(unique_8_df) %>% inner_join(issuers, by = 'cusip6') %>%
> inner_join(ciks) %>%
> select(cik, cusip, cusip6, issuer_name_1, company_name) %>%
> arrange(cik) %>% filter(cik %in% soft_match_ciks))
cik | cusip | cusip6 | issuer_name_1 | company_name | |
---|---|---|---|---|---|
1 | 808575 | 71913203 | 719132 | PHOENIX RESTAURANT GROUP INC | HUGHES RESOURCES INC |
2 | 808575 | 71913203 | 719132 | PHOENIX RESTAURANT GROUP INC | PHOENIX RESOURCES TECHNOLOGIES INC |
3 | 863924 | 19248110 | 192481 | COHO ENERGY INC | COHO RESOURCES INC |
4 | 882321 | 35168810 | 351688 | FRAME TECHNOLOGY INC | FRAME TECHNOLOGY CORP/CA |
5 | 897878 | 00371480 | 003714 | ABN AMRO NORTH AMER INC | ABN AMRO BANK NV |
6 | 897878 | 00371480 | 003714 | ABN AMRO NORTH AMER INC | ROYAL BANK OF SCOTLAND N.V. |
7 | 897910 | 61735710 | 617357 | MORGAN GRENFELL SMALLCAP FD | MORGAN GRENFELL CAPITAL MANAGEMENT |
8 | 916537 | 40862310 | 408623 | HAMMONS JOHN Q HOTELS INC | HAMMONS JOHN Q HOTELS FINANCE CORP |
9 | 922738 | 86330P10 | 86330P | STRESSGEN BIOTECHNOLOGIES | STRESSGEN BIOTECHNOLOGIES CORP /FI |
10 | 928539 | 64107410 | 641074 | NESTOR INC | WAND NESTOR INVESTMENTS L P ET AL |
11 | 1044278 | 35849710 | 358497 | FRIENDLY ICE CREAM CORP NEW | FRIENDLYS RESTAURANTS FRANCHISE INC |
12 | 1055037 | 72345109 | 723451 | PINNACLE BK JASPER ALA | PINNACLE BUSINESS MANAGEMENT INC |
13 | 1064990 | 86330P10 | 86330P | STRESSGEN BIOTECHNOLOGIES | STRESSGEN BIOTECHNOLOGIES CORP |
14 | 1066114 | 87923P10 | 87923P | TELE CENTRO OESTE CELULAR | TELE CENTRO DESTE CELULAR PARTICIPACOES |
15 | 1066114 | 87923P10 | 87923P | TELE CENTRO OESTE CELULAR | TELE CENTRO OESTE CELULAR PARTICIPACOES |
16 | 1399815 | 66986W10 | 66986W | NOVAMED INC DEL | NOVAMED EYECARE RESEARCH, INC |
> View(unique_8_df %>% distinct(cik, cusip6) %>% group_by(cik) %>%
> summarise(num = n()) %>% filter(num == 1) %>%
> inner_join(unique_8_df) %>% inner_join(issuers, by = 'cusip6') %>%
> inner_join(ciks) %>%
> select(cik, cusip, cusip6, issuer_name_1, company_name) %>%
> arrange(cik) %>% filter(cik %in% not_a_match_ciks))
cik | cusip | cusip6 | issuer_name_1 | company_name | |
---|---|---|---|---|---|
1 | 48544 | 44090300 | 440903 | HOSPLEX SYS INC | HOSOI GARDEN MORTUARY INC |
2 | 712468 | 25059510 | 250595 | DESOTO INC | NARRAGANSETT FIRST FUND |
3 | 789987 | 42990810 | 429908 | HIGH CASH PARTNERS LP | RESOURCES PENSION SHARES 5 LP |
4 | 799723 | 37936K00 | 37936K | GLOBAL WRESTLING ALLIANCE INC | BIONET TECHNOLOGIES INC |
5 | 799723 | 37936K00 | 37936K | GLOBAL WRESTLING ALLIANCE INC | PRATT WYLCE & LORDS LTD |
6 | 810995 | 01880500 | 018805 | ALLIANZ SOCIETAS EUROPAEA-SE | ALLIED BANKSHARES INC |
7 | 825288 | 68219P10 | 68219P | ON TECHNOLOGY CORP | INSURED MUNICIPALS INCOME TRUST SERIES 200 |
8 | 842180 | 05959110 | 059591 | BANCO FRANCES DEL RIO DE LA | BANCO BILBAO VIZCAYA ARGENTARIA S A |
9 | 842180 | 05959110 | 059591 | BANCO FRANCES DEL RIO DE LA | BANCO BILBAO VIZCAYA ARGENTARIA, S.A. |
10 | 843090 | 03253500 | 032535 | ANAGRAM PLUS INC | ANANGEL AMERICAN SHIPHOLDINGS LTD |
11 | 848465 | 39015810 | 390158 | GREAT BEAR TECHNOLOGY INC | STARPRESS INC |
12 | 900385 | 15043810 | 150438 | CEDAR INCOME FD LTD | AEGON USA INC |
13 | 927027 | 02359100 | 023591 | AMERCO INC | SHOEN JAMES P |
14 | 929757 | 87259109 | 872591 | TNBANK OAK RIDGE TENN | T NETIX INC |
15 | 943230 | 89816810 | 898168 | TRUMP HOTELS CASINO RESORTS | TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 |
16 | 1004278 | 35849710 | 358497 | FRIENDLY ICE CREAM CORP NEW | KEYSTONE INSTITUTIONAL TRUST |
17 | 1010312 | 68267810 | 682678 | ONEOK INC | WESTAR CAPITAL INC |
18 | 1010312 | 68267810 | 682678 | ONEOK INC | WESTAR INDUSTRIES INC |
19 | 1021255 | 74037Q10 | 74037Q | PREFERRED INCOME MGMT FD INC | HOREJSI ENTERPRISES INC |
20 | 1021255 | 74037Q10 | 74037Q | PREFERRED INCOME MGMT FD INC | HOREJSI INC |
21 | 1062023 | 60753310 | 607533 | MODEM MEDIA INC | OPTIMARK HOLDINGS INC |
22 | 1062023 | 60753310 | 607533 | MODEM MEDIA INC | OPTIMARK TECHNOLOGIES INC |
23 | 1101734 | 62885310 | 628853 | NCL HLDG ASA | ARRASAS LTD |
24 | 1113227 | 90337P10 | 90337P | US WATS INC | CAPSULE COMMUNICATIONS INC DE |
25 | 1113227 | 90337P10 | 90337P | US WATS INC | CAPSULE COMMUNICATIONS INC |
26 | 1218991 | 97263M10 | 97263M | WIMM-BILL-DANN FOODS OJSC | YUSHVAEV GAVRIL A |
27 | 1374976 | 66651104 | 666511 | NORTHLAND PWR INC | NORTHPORT CAPITAL INC. |
28 | 1374976 | 66651104 | 666511 | NORTHLAND PWR INC | NORTHPORT NETWORK SYSTEMS, INC. |
> View(unique_8_df %>% distinct(cik, cusip6) %>%
> group_by(cik) %>% summarise(num = n()) %>%
> filter(num == 1) %>% inner_join(unique_8_df) %>%
> mutate(ncusip = cusip) %>%
> inner_join(stocknames, by = 'ncusip') %>%
> inner_join(ciks) %>%
> select(cik, ncusip, namedt, nameenddt, ticker, comnam, company_name))
cik | ncusip | namedt | nameenddt | ticker | comnam | company_name | |
---|---|---|---|---|---|---|---|
1 | 13006 | 17253C10 | 1993-03-26 | 1995-07-17 | RXC | CIRCA PHARMACEUTICALS INC | CIRCA PHARMACEUTICALS INC |
2 | 14177 | 10876310 | 1972-12-14 | 2019-12-31 | BRID | BRIDGFORD FOODS CORP | BRIDGFORD FOODS CORP |
3 | 22896 | 20482G10 | 1996-04-19 | 1998-01-30 | CSRV | COMPUSERVE INC DEL | COMPUSERVE CORP |
4 | 28209 | 24955510 | 1972-12-14 | 1996-12-04 | DEPS | DEPOSIT GUARANTY CORP | DEPOSIT GUARANTY CORP |
5 | 28209 | 24955510 | 1996-12-05 | 1998-04-30 | DEP | DEPOSIT GUARANTY CORP | DEPOSIT GUARANTY CORP |
6 | 39838 | 36442410 | 1975-06-11 | 1979-11-11 | GVTN | GALVESTON HOUSTON CO | GALVESTON HOUSTON CO |
7 | 39838 | 36442410 | 1979-11-12 | 1995-02-28 | GHX | GALVESTON HOUSTON CO | GALVESTON HOUSTON CO |
8 | 40675 | 37017210 | 1986-08-25 | 1991-11-20 | GKIE | GENERAL KINETICS INC | GENERAL KINETICS INC |
9 | 40675 | 37017210 | 1991-11-21 | 1997-05-29 | GKI | GENERAL KINETICS INC | GENERAL KINETICS INC |
10 | 50253 | 45583010 | 1972-12-14 | 1998-10-29 | IACI | INDUSTRIAL ACOUSTICS INC | INDUSTRIAL ACOUSTICS CO INC |
11 | 65297 | 05873J10 | 1994-08-01 | 1998-03-26 | BGLV | BALLYS GRAND INC | BALLYS GRAND INC /DE/ |
12 | 74154 | 68267810 | 1980-12-10 | 1997-11-27 | OKE | ONEOK INC | ONEOK INC |
13 | 75748 | 69562410 | 1986-09-09 | 1989-09-05 | PAIN | PAIN SUPPRESSION LABS INC | PAIN SUPPRESSION LABS, INC. |
14 | 97483 | 71891320 | 1992-05-13 | 1992-10-21 | PHNI | PHOENIX RESOURCE COS INC | PHOENIX RESOURCE COMPANIES INC |
15 | 97483 | 71891320 | 1992-10-22 | 1996-05-20 | PHN | PHOENIX RESOURCE COS INC | PHOENIX RESOURCE COMPANIES INC |
16 | 109870 | 37722620 | 1988-07-15 | 1994-05-23 | GLMA | GLASSMASTER COMPANY | GLASSMASTER CO |
17 | 109870 | 37722620 | 1994-05-24 | 1999-11-05 | GLMA | GLASSMASTER CO | GLASSMASTER CO |
18 | 225998 | 93881510 | 1978-08-10 | 1992-08-03 | WECO | WASHINGTON ENERGY CO | WASHINGTON ENERGY CO |
19 | 225998 | 93881510 | 1992-08-04 | 1997-02-10 | WEG | WASHINGTON ENERGY CO | WASHINGTON ENERGY CO |
20 | 276886 | 53191120 | 1989-07-06 | 1993-07-30 | LFT | LIFETIME CORP | LIFETIME CORP |
21 | 350305 | 83438010 | 1981-07-08 | 1997-07-11 | SOLV | SOLV EX CORP | SOLV EX CORP |
22 | 350926 | 71365810 | 1984-01-25 | 1985-10-07 | MOVE | PEREGRINE ENTERTAINMENT INC | PEREGRINE ENTERTAINMENT LTD |
23 | 350926 | 71365810 | 1984-01-25 | 1985-10-07 | MOVE | PEREGRINE ENTERTAINMENT INC | PEREGRINE ENTERTAINMENT |
24 | 354827 | 23790310 | 1986-12-18 | 1996-01-10 | DMCB | DATA MEASUREMENT CORP | DATA MEASUREMENT CORP |
25 | 712468 | 25059510 | 1968-01-02 | 1996-09-27 | DSO | DE SOTO INC | NARRAGANSETT FIRST FUND |
26 | 712744 | 25390210 | 1983-02-10 | 1987-12-10 | DCAI | DIGITAL COMMUNICATIONS ASSOC IN | DIGITAL COMMUNICATIONS ASSOCIATES INC |
27 | 712744 | 25390210 | 1987-12-11 | 1993-11-08 | DCA | DIGITAL COMMUNICATIONS ASSOC IN | DIGITAL COMMUNICATIONS ASSOCIATES INC |
28 | 717192 | 58977410 | 1986-05-13 | 1993-08-17 | MRCO | MERIDIAN NATIONAL CORP | MERIDIAN NATIONAL CORP |
29 | 725014 | 75449810 | 1987-09-17 | 1987-09-17 | KOEN | RAWSON KOENIG INC | RAWSON KOENIG INC |
30 | 725014 | 75449810 | 1987-09-18 | 1997-11-23 | RAKO | RAWSON KOENIG INC | RAWSON KOENIG INC |
31 | 727127 | 78403010 | 1986-09-03 | 1996-02-05 | SCOM | S C S COMPUTE INC | SCS COMPUTE INC |
32 | 729213 | 74791410 | 1984-01-13 | 2002-05-27 | QEKG | Q MED | Q MED INC |
33 | 729213 | 74791410 | 1984-01-13 | 2002-05-27 | QEKG | Q MED | QMED INC |
34 | 729213 | 74791410 | 2002-05-28 | 2008-05-20 | QMED | QMED INC | Q MED INC |
35 | 729213 | 74791410 | 2002-05-28 | 2008-05-20 | QMED | QMED INC | QMED INC |
36 | 737033 | 74835710 | 1984-05-14 | 1998-11-18 | QTEC | QUESTECH INC | QUESTECH INC |
37 | 740074 | 55349510 | 1984-10-01 | 1994-09-01 | SSS | M S A REALTY CORP | MSA REALTY CORP |
38 | 761651 | 37248610 | 1993-07-01 | 1998-04-01 | GMBS | GEORGE MASON BANKSHARES INC | MASON GEORGE BANKSHARES INC |
39 | 768863 | 88254510 | 1990-12-19 | 1997-04-02 | TMR | TEXAS MERIDIAN RESOURCES CORP | TEXAS MERIDIAN RESOURCES LTD |
40 | 768863 | 88254510 | 1997-04-03 | 1997-06-19 | TMR | TEXAS MERIDIAN RESOURCES CORP | TEXAS MERIDIAN RESOURCES LTD |
41 | 789577 | 89324510 | 1986-04-04 | 1997-12-18 | TLII | TRANS LEASING INTL INC | TLI LIQUIDATING CORP |
42 | 789577 | 89324510 | 1986-04-04 | 1997-12-18 | TLII | TRANS LEASING INTL INC | TRANS LEASING INTERNATIONAL INC |
43 | 791350 | 40641710 | 1995-03-13 | 1998-10-15 | HSNR | HALSTEAD ENERGY CORP | HALSTEAD ENERGY CORP |
44 | 801121 | 45753810 | 1986-10-20 | 1996-01-25 | INMC | INMAC CORP | INMAC CORP |
45 | 803058 | 89151110 | 1987-03-26 | 2001-11-02 | TOTL | TOTAL RESEARCH CORP | TOTAL RESEARCH CORP |
46 | 810578 | 79284810 | 1987-05-18 | 1999-10-01 | SPBC | ST PAUL BANCORP INC | ST PAUL BANCORP INC |
47 | 814479 | 39682910 | 1988-04-15 | 1994-01-31 | GFCT | GREENWICH FINANCIAL CORP | GREENWICH FINANCIAL CORP |
48 | 825288 | 68219P10 | 1995-08-01 | 2004-02-13 | ONTC | ON TECHNOLOGY CORP | INSURED MUNICIPALS INCOME TRUST SERIES 200 |
49 | 828360 | 82539610 | 1998-06-16 | 2000-01-28 | SHO | SHOWPOWER INC | SHOWPOWER INC |
50 | 837028 | 87236310 | 1993-01-15 | 1995-09-01 | TDXC | T D X CORP | TDX CORP |
51 | 842180 | 05959110 | 1993-11-24 | 2001-02-27 | BFR | BANCO FRANCES DEL RIO DE LA PLA | BANCO BILBAO VIZCAYA ARGENTARIA S A |
52 | 842180 | 05959110 | 1993-11-24 | 2001-02-27 | BFR | BANCO FRANCES DEL RIO DE LA PLA | BANCO BILBAO VIZCAYA ARGENTARIA, S.A. |
53 | 849314 | 08179540 | 1989-06-09 | 2007-10-18 | BNG | BENETTON GROUP S P A | BENETTON GROUP SPA |
54 | 853927 | 02744L10 | 1994-11-21 | 1999-05-07 | ENQ | AMERICAN MEDIA INC | AMERICAN MEDIA OPERATIONS INC |
55 | 859632 | 90249910 | 1996-05-24 | 1998-02-27 | TYFC | TYSONS FINANCIAL CORP | TYSONS FINANCIAL CORP |
56 | 862450 | 03922910 | 1995-08-04 | 1997-03-06 | ACA | ARCADIAN CORP | ARCADIAN CORP |
57 | 863924 | 19248110 | 1993-10-04 | 1999-03-05 | COHO | COHO ENERGY INC | COHO RESOURCES INC |
58 | 875174 | 59373610 | 1991-10-03 | 1999-01-07 | SUBS | MIAMI SUBS CORP | MIAMI SUBS CORP |
59 | 877210 | 86431310 | 1993-09-01 | 1998-04-16 | SUBM | SUBMICRON SYSTEMS CORP | SUBMICRON SYSTEMS CORP |
60 | 877984 | 09161R10 | 1991-11-19 | 1999-03-14 | BIOMF | BIOMIRA INC | BIOMIRA INC |
61 | 877984 | 09161R10 | 1999-03-15 | 2007-12-10 | BIOM | BIOMIRA INC | BIOMIRA INC |
62 | 882264 | 03852L10 | 1993-06-10 | 1995-08-22 | AKSEF | ARAKIS ENERGY CORP | ARAKIS ENERGY CORP |
63 | 882264 | 03852L10 | 1995-08-23 | 1995-09-21 | AKSEF | ARAKIS ENERGY CORP | ARAKIS ENERGY CORP |
64 | 882264 | 03852L10 | 1995-09-22 | 1998-10-08 | AKSEF | ARAKIS ENERGY CORP | ARAKIS ENERGY CORP |
65 | 882321 | 35168810 | 1992-02-12 | 1995-10-27 | FRAM | FRAME TECHNOLOGY CORP | FRAME TECHNOLOGY CORP/CA |
66 | 894541 | 08659410 | 1993-01-29 | 1994-10-13 | BWS | BETA WELL SERVICE INC | BETA WELL SERVICE INC |
67 | 897074 | 86422210 | 1993-04-14 | 1994-12-27 | SOV | STYLES ON VIDEO INC | STYLES ON VIDEO INC |
68 | 897910 | 61735710 | 1987-05-08 | 2000-10-31 | MGC | MORGAN GRENFELL SMALLCAP FUND | MORGAN GRENFELL CAPITAL MANAGEMENT |
69 | 900385 | 15043810 | 1986-12-17 | 1989-10-01 | CEDR | CEDAR INCOME FD 1 LTD | AEGON USA INC |
70 | 900385 | 15043810 | 1989-10-02 | 2000-02-28 | CEDR | CEDAR INCOME FUND LTD | AEGON USA INC |
71 | 904900 | 08372L10 | 1996-03-01 | 1998-10-09 | BEI | BERG ELECTRONICS CORP | BERG ELECTRONICS CORP /DE/ |
72 | 906391 | 12628810 | 1993-09-28 | 1996-04-30 | COSB | C S B FINANCIAL CORP | CSB FINANCIAL CORP |
73 | 914344 | 05873110 | 1991-11-11 | 1996-06-18 | BGII | BALLY GAMING INTERNATIONAL INC | BALLY GAMING INC |
74 | 916536 | 40862310 | 1994-11-17 | 2000-02-27 | JQH | HAMMONS JOHN Q HOTELS INC | HAMMONS JOHN Q HOTELS LP |
75 | 916536 | 40862310 | 2000-02-28 | 2005-09-16 | JQH | HAMMONS JOHN Q HOTELS INC | HAMMONS JOHN Q HOTELS LP |
76 | 916537 | 40862310 | 1994-11-17 | 2000-02-27 | JQH | HAMMONS JOHN Q HOTELS INC | HAMMONS JOHN Q HOTELS FINANCE CORP |
77 | 916537 | 40862310 | 2000-02-28 | 2005-09-16 | JQH | HAMMONS JOHN Q HOTELS INC | HAMMONS JOHN Q HOTELS FINANCE CORP |
78 | 924835 | 88871110 | 1994-09-21 | 1999-03-28 | TVLI | TIVOLI INDUSTRIES INC | TIVOLI INDUSTRIES INC |
79 | 928507 | 12685910 | 1993-11-19 | 1994-12-22 | CMAX | CABLEMAXX INC | CABLEMAXX INC /DE |
80 | 928507 | 12685910 | 1994-12-23 | 1996-02-23 | CMAX | CABLEMAXX HOLDINGS INC | CABLEMAXX INC /DE |
81 | 928539 | 64107410 | 1983-12-22 | 1991-12-19 | NEST | NESTOR INC | WAND NESTOR INVESTMENTS L P ET AL |
82 | 928539 | 64107410 | 1991-12-20 | 2005-06-30 | NA | NESTOR INC | WAND NESTOR INVESTMENTS L P ET AL |
83 | 933425 | 46530610 | 1995-11-21 | 1996-12-30 | OVEN | ITALIAN OVEN INC | ACCREDITED BUSINESS CONSOLIDATORS CORP. |
84 | 933425 | 46530610 | 1995-11-21 | 1996-12-30 | OVEN | ITALIAN OVEN INC | ITALIAN OVEN INC |
85 | 933425 | 46530610 | 1995-11-21 | 1996-12-30 | OVEN | ITALIAN OVEN INC | STORE LIQUIDATION CO INC |
86 | 943230 | 89816810 | 1995-06-07 | 2004-08-09 | DJT | TRUMP HOTELS & CASINO RESRTS IN | TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 |
87 | 948753 | 69367410 | 1994-05-24 | 1996-12-30 | PXRE | P X R E CORP | PXRE CORP /FA/ |
88 | 948753 | 69367410 | 1996-12-31 | 1999-10-05 | PXT | P X R E CORP | PXRE CORP /FA/ |
89 | 1002819 | 00891130 | 1995-11-16 | 2002-12-02 | ACNAF | AIR CANADA INC | AIR CANADA /QUEBEC/ |
90 | 1004278 | 35849710 | 1997-11-14 | 2000-06-07 | FRND | FRIENDLY ICE CREAM CORP NEW | KEYSTONE INSTITUTIONAL TRUST |
91 | 1004278 | 35849710 | 2000-06-08 | 2007-08-30 | FRN | FRIENDLY ICE CREAM CORP NEW | KEYSTONE INSTITUTIONAL TRUST |
92 | 1010136 | 45192110 | 1996-06-04 | 1999-02-02 | IMCX | IMAGEMATRIX CORP | IMAGEMATRIX CORP |
93 | 1010312 | 68267810 | 1980-12-10 | 1997-11-27 | OKE | ONEOK INC | WESTAR CAPITAL INC |
94 | 1010312 | 68267810 | 1980-12-10 | 1997-11-27 | OKE | ONEOK INC | WESTAR INDUSTRIES INC |
95 | 1013050 | 74257410 | 1996-07-02 | 2000-11-13 | AFIS | PRINTRAK INTERNATIONAL INC | PRINTRAK INTERNATIONAL INC |
96 | 1021255 | 74037Q10 | 1993-02-12 | 1999-08-29 | PFM | PREFERRED INCOME MGT FD INC | HOREJSI ENTERPRISES INC |
97 | 1021255 | 74037Q10 | 1993-02-12 | 1999-08-29 | PFM | PREFERRED INCOME MGT FD INC | HOREJSI INC |
98 | 1022097 | 89990510 | 1997-03-26 | 1998-07-28 | TRBDF | TURBODYNE TECHNOLOGIES INC | TURBODYNE TECHNOLGIES INC |
99 | 1022097 | 89990510 | 1997-03-26 | 1998-07-28 | TRBDF | TURBODYNE TECHNOLOGIES INC | TURBODYNE TECHNOLOGIES INC |
100 | 1022097 | 89990510 | 1997-03-26 | 1998-07-28 | TRBDF | TURBODYNE TECHNOLOGIES INC | TURBODYNE TECHNOLOGIES, INC |
101 | 1022097 | 89990510 | 1998-07-29 | 1999-03-31 | TRBD | TURBODYNE TECHNOLOGIES INC | TURBODYNE TECHNOLGIES INC |
102 | 1022097 | 89990510 | 1998-07-29 | 1999-03-31 | TRBD | TURBODYNE TECHNOLOGIES INC | TURBODYNE TECHNOLOGIES INC |
103 | 1022097 | 89990510 | 1998-07-29 | 1999-03-31 | TRBD | TURBODYNE TECHNOLOGIES INC | TURBODYNE TECHNOLOGIES, INC |
104 | 1024627 | M9888A10 | 1996-11-01 | 1998-08-05 | ZAGIF | ZAG INDUSTRIES LTD | ZAG INDUSTRIES LTD |
105 | 1035028 | 62907710 | 1997-05-21 | 1998-10-21 | NEIP | N E I WEBWORLD INC | NEI WEBWORLD INC |
106 | 1040476 | 29439810 | 1997-12-02 | 2000-11-15 | EBI | EQUALITY BANCORP INC | EQUALITY BANCORP INC |
107 | 1040649 | 74871810 | 1997-06-25 | 2007-01-19 | LQ | QUINENCO S A | QUINENCO SA |
108 | 1041970 | 02914P20 | 1999-09-08 | 2000-10-12 | AFV | AMERICAN QUANTUM CYCLES INC | AMERICAN QUANTUM CYCLES INC |
109 | 1044278 | 35849710 | 1997-11-14 | 2000-06-07 | FRND | FRIENDLY ICE CREAM CORP NEW | FRIENDLYS RESTAURANTS FRANCHISE INC |
110 | 1044278 | 35849710 | 2000-06-08 | 2007-08-30 | FRN | FRIENDLY ICE CREAM CORP NEW | FRIENDLYS RESTAURANTS FRANCHISE INC |
111 | 1057437 | 30267410 | 1998-09-16 | 2001-08-13 | FVNB | F V N B CORP | FVNB CORP |
112 | 1062023 | 60753310 | 1999-02-05 | 2000-06-05 | MMPT | MODEM MEDIA POPPE TYSON INC | OPTIMARK HOLDINGS INC |
113 | 1062023 | 60753310 | 1999-02-05 | 2000-06-05 | MMPT | MODEM MEDIA POPPE TYSON INC | OPTIMARK TECHNOLOGIES INC |
114 | 1062023 | 60753310 | 2000-06-06 | 2004-10-15 | MMPT | MODEM MEDIA INC | OPTIMARK HOLDINGS INC |
115 | 1062023 | 60753310 | 2000-06-06 | 2004-10-15 | MMPT | MODEM MEDIA INC | OPTIMARK TECHNOLOGIES INC |
116 | 1075710 | 15691510 | 2000-05-02 | 2002-06-17 | CERT | CERTICOM CORP | CERTICOM CORP |
117 | 1096300 | M4591910 | 2000-08-02 | 2001-08-01 | FLRE | FLOWARE WIRELESS SYSTEMS INC | FLOWARE WIRELESS SYSTEMS LTD |
118 | 1101734 | 62885310 | 1999-07-09 | 2000-12-01 | NRW | N C L HOLDING ASA | ARRASAS LTD |
119 | 1113227 | 90337P10 | 1994-01-21 | 2000-05-03 | USWI | US WATS INC | CAPSULE COMMUNICATIONS INC DE |
120 | 1113227 | 90337P10 | 1994-01-21 | 2000-05-03 | USWI | US WATS INC | CAPSULE COMMUNICATIONS INC |
121 | 1135254 | M7024010 | 2000-08-08 | 2019-12-31 | MNDO | MIND C T I LTD | EISINGER MONICA |
122 | 1135254 | M7024010 | 2000-08-08 | 2019-12-31 | MNDO | MIND C T I LTD | IANCU MONICA |
123 | 1200598 | 87927W10 | 1997-07-21 | 2003-08-01 | TI | TELECOM ITALIA SPA | TELECOM ITALIA SPA |
124 | 1218991 | 97263M10 | 2002-02-08 | 2011-05-27 | WBD | WIMM BILL DANN FOODS O J S C | YUSHVAEV GAVRIL A |
125 | 1399815 | 66986W10 | 1999-08-18 | 2004-03-28 | NOVA | NOVAMED EYECARE INC | NOVAMED EYECARE RESEARCH, INC |
126 | 1399815 | 66986W10 | 2004-03-29 | 2010-05-31 | NOVA | NOVAMED INC | NOVAMED EYECARE RESEARCH, INC |
> View(unique_8_df %>% distinct(cik, cusip6) %>%
> group_by(cik) %>% summarise(num = n()) %>%
> filter(num == 1) %>% inner_join(unique_8_df) %>%
> mutate(ncusip = cusip) %>%
> inner_join(stocknames, by = 'ncusip') %>% inner_join(ciks) %>%
> select(cik, ncusip, namedt, nameenddt, ticker, comnam, company_name) %>%
> filter(cik %in% soft_match_stocknames_ciks))
cik | ncusip | namedt | nameenddt | ticker | comnam | company_name | |
---|---|---|---|---|---|---|---|
1 | 22896 | 20482G10 | 1996-04-19 | 1998-01-30 | CSRV | COMPUSERVE INC DEL | COMPUSERVE CORP |
2 | 50253 | 45583010 | 1972-12-14 | 1998-10-29 | IACI | INDUSTRIAL ACOUSTICS INC | INDUSTRIAL ACOUSTICS CO INC |
3 | 350926 | 71365810 | 1984-01-25 | 1985-10-07 | MOVE | PEREGRINE ENTERTAINMENT INC | PEREGRINE ENTERTAINMENT LTD |
4 | 350926 | 71365810 | 1984-01-25 | 1985-10-07 | MOVE | PEREGRINE ENTERTAINMENT INC | PEREGRINE ENTERTAINMENT |
5 | 768863 | 88254510 | 1990-12-19 | 1997-04-02 | TMR | TEXAS MERIDIAN RESOURCES CORP | TEXAS MERIDIAN RESOURCES LTD |
6 | 768863 | 88254510 | 1997-04-03 | 1997-06-19 | TMR | TEXAS MERIDIAN RESOURCES CORP | TEXAS MERIDIAN RESOURCES LTD |
7 | 789577 | 89324510 | 1986-04-04 | 1997-12-18 | TLII | TRANS LEASING INTL INC | TLI LIQUIDATING CORP |
8 | 789577 | 89324510 | 1986-04-04 | 1997-12-18 | TLII | TRANS LEASING INTL INC | TRANS LEASING INTERNATIONAL INC |
9 | 853927 | 02744L10 | 1994-11-21 | 1999-05-07 | ENQ | AMERICAN MEDIA INC | AMERICAN MEDIA OPERATIONS INC |
10 | 863924 | 19248110 | 1993-10-04 | 1999-03-05 | COHO | COHO ENERGY INC | COHO RESOURCES INC |
11 | 897910 | 61735710 | 1987-05-08 | 2000-10-31 | MGC | MORGAN GRENFELL SMALLCAP FUND | MORGAN GRENFELL CAPITAL MANAGEMENT |
12 | 914344 | 05873110 | 1991-11-11 | 1996-06-18 | BGII | BALLY GAMING INTERNATIONAL INC | BALLY GAMING INC |
13 | 916536 | 40862310 | 1994-11-17 | 2000-02-27 | JQH | HAMMONS JOHN Q HOTELS INC | HAMMONS JOHN Q HOTELS LP |
14 | 916536 | 40862310 | 2000-02-28 | 2005-09-16 | JQH | HAMMONS JOHN Q HOTELS INC | HAMMONS JOHN Q HOTELS LP |
15 | 916537 | 40862310 | 1994-11-17 | 2000-02-27 | JQH | HAMMONS JOHN Q HOTELS INC | HAMMONS JOHN Q HOTELS FINANCE CORP |
16 | 916537 | 40862310 | 2000-02-28 | 2005-09-16 | JQH | HAMMONS JOHN Q HOTELS INC | HAMMONS JOHN Q HOTELS FINANCE CORP |
17 | 928539 | 64107410 | 1983-12-22 | 1991-12-19 | NEST | NESTOR INC | WAND NESTOR INVESTMENTS L P ET AL |
18 | 928539 | 64107410 | 1991-12-20 | 2005-06-30 | NA | NESTOR INC | WAND NESTOR INVESTMENTS L P ET AL |
19 | 1002819 | 00891130 | 1995-11-16 | 2002-12-02 | ACNAF | AIR CANADA INC | AIR CANADA /QUEBEC/ |
20 | 1044278 | 35849710 | 1997-11-14 | 2000-06-07 | FRND | FRIENDLY ICE CREAM CORP NEW | FRIENDLYS RESTAURANTS FRANCHISE INC |
21 | 1044278 | 35849710 | 2000-06-08 | 2007-08-30 | FRN | FRIENDLY ICE CREAM CORP NEW | FRIENDLYS RESTAURANTS FRANCHISE INC |
22 | 1096300 | M4591910 | 2000-08-02 | 2001-08-01 | FLRE | FLOWARE WIRELESS SYSTEMS INC | FLOWARE WIRELESS SYSTEMS LTD |
23 | 1399815 | 66986W10 | 1999-08-18 | 2004-03-28 | NOVA | NOVAMED EYECARE INC | NOVAMED EYECARE RESEARCH, INC |
24 | 1399815 | 66986W10 | 2004-03-29 | 2010-05-31 | NOVA | NOVAMED INC | NOVAMED EYECARE RESEARCH, INC |
> View(unique_8_df %>% distinct(cik, cusip6) %>%
> group_by(cik) %>% summarise(num = n()) %>%
> filter(num == 1) %>% inner_join(unique_8_df) %>%
> mutate(ncusip = cusip) %>% inner_join(stocknames, by = 'ncusip') %>%
> inner_join(ciks) %>%
> select(cik, ncusip, namedt, nameenddt, ticker, comnam, company_name) %>%
> filter(cik %in% not_a_match_stocknames_ciks))
cik | ncusip | namedt | nameenddt | ticker | comnam | company_name | |
---|---|---|---|---|---|---|---|
1 | 712468 | 25059510 | 1968-01-02 | 1996-09-27 | DSO | DE SOTO INC | NARRAGANSETT FIRST FUND |
2 | 825288 | 68219P10 | 1995-08-01 | 2004-02-13 | ONTC | ON TECHNOLOGY CORP | INSURED MUNICIPALS INCOME TRUST SERIES 200 |
3 | 842180 | 05959110 | 1993-11-24 | 2001-02-27 | BFR | BANCO FRANCES DEL RIO DE LA PLA | BANCO BILBAO VIZCAYA ARGENTARIA S A |
4 | 842180 | 05959110 | 1993-11-24 | 2001-02-27 | BFR | BANCO FRANCES DEL RIO DE LA PLA | BANCO BILBAO VIZCAYA ARGENTARIA, S.A. |
5 | 900385 | 15043810 | 1986-12-17 | 1989-10-01 | CEDR | CEDAR INCOME FD 1 LTD | AEGON USA INC |
6 | 900385 | 15043810 | 1989-10-02 | 2000-02-28 | CEDR | CEDAR INCOME FUND LTD | AEGON USA INC |
7 | 943230 | 89816810 | 1995-06-07 | 2004-08-09 | DJT | TRUMP HOTELS & CASINO RESRTS IN | TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 |
8 | 1004278 | 35849710 | 1997-11-14 | 2000-06-07 | FRND | FRIENDLY ICE CREAM CORP NEW | KEYSTONE INSTITUTIONAL TRUST |
9 | 1004278 | 35849710 | 2000-06-08 | 2007-08-30 | FRN | FRIENDLY ICE CREAM CORP NEW | KEYSTONE INSTITUTIONAL TRUST |
10 | 1010312 | 68267810 | 1980-12-10 | 1997-11-27 | OKE | ONEOK INC | WESTAR CAPITAL INC |
11 | 1010312 | 68267810 | 1980-12-10 | 1997-11-27 | OKE | ONEOK INC | WESTAR INDUSTRIES INC |
12 | 1021255 | 74037Q10 | 1993-02-12 | 1999-08-29 | PFM | PREFERRED INCOME MGT FD INC | HOREJSI ENTERPRISES INC |
13 | 1021255 | 74037Q10 | 1993-02-12 | 1999-08-29 | PFM | PREFERRED INCOME MGT FD INC | HOREJSI INC |
14 | 1062023 | 60753310 | 1999-02-05 | 2000-06-05 | MMPT | MODEM MEDIA POPPE TYSON INC | OPTIMARK HOLDINGS INC |
15 | 1062023 | 60753310 | 1999-02-05 | 2000-06-05 | MMPT | MODEM MEDIA POPPE TYSON INC | OPTIMARK TECHNOLOGIES INC |
16 | 1062023 | 60753310 | 2000-06-06 | 2004-10-15 | MMPT | MODEM MEDIA INC | OPTIMARK HOLDINGS INC |
17 | 1062023 | 60753310 | 2000-06-06 | 2004-10-15 | MMPT | MODEM MEDIA INC | OPTIMARK TECHNOLOGIES INC |
18 | 1101734 | 62885310 | 1999-07-09 | 2000-12-01 | NRW | N C L HOLDING ASA | ARRASAS LTD |
19 | 1113227 | 90337P10 | 1994-01-21 | 2000-05-03 | USWI | US WATS INC | CAPSULE COMMUNICATIONS INC DE |
20 | 1113227 | 90337P10 | 1994-01-21 | 2000-05-03 | USWI | US WATS INC | CAPSULE COMMUNICATIONS INC |
21 | 1135254 | M7024010 | 2000-08-08 | 2019-12-31 | MNDO | MIND C T I LTD | EISINGER MONICA |
22 | 1135254 | M7024010 | 2000-08-08 | 2019-12-31 | MNDO | MIND C T I LTD | IANCU MONICA |
23 | 1218991 | 97263M10 | 2002-02-08 | 2011-05-27 | WBD | WIMM BILL DANN FOODS O J S C | YUSHVAEV GAVRIL A |
crsp=# SELECT * FROM edgar.filings
crsp-# WHERE cik = 943230;
company_name | form_type | cik | date_filed | file_name
------------------------------------------------+-----------+--------+------------+--------------------------------------------
TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 | 24F-2NT | 943230 | 2001-03-29 | edgar/data/943230/0001116679-01-000657.txt
TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 | 24F-2NT | 943230 | 2000-03-31 | edgar/data/943230/0000903112-00-000708.txt
TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 | 24F-2NT | 943230 | 1999-04-01 | edgar/data/943230/0000903112-99-000570.txt
TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 | S-6EL24 | 943230 | 1995-10-23 | edgar/data/943230/0000934850-95-003781.txt
TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 | 24F-2NT | 943230 | 2002-03-27 | edgar/data/943230/0001116679-02-000831.txt
TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 | 485BPOS | 943230 | 2002-11-27 | edgar/data/943230/0000045809-02-000091.txt
TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 | 24F-2NT | 943230 | 2003-03-28 | edgar/data/943230/0001116679-03-001001.txt
TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 | 487 | 943230 | 1996-01-12 | edgar/data/943230/0000950130-96-000107.txt
TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 | 497 | 943230 | 1996-01-19 | edgar/data/943230/0000950130-96-000180.txt
TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 | 24F-2NT | 943230 | 1997-02-28 | edgar/data/943230/0000903112-97-000315.txt
TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 | SC 13G | 943230 | 1997-02-12 | edgar/data/943230/0000924355-97-000013.txt
TAX EXEMPT SECURITIES TRUST NEW YORK TRUST 151 | 24F-2NT | 943230 | 1998-02-27 | edgar/data/943230/0000903112-98-000440.txt
(12 rows)
@iangow I don't know if you've been looking at my very partially formed comments above so far, but apart from the hilarity of this example, I guess there is a rather pertinent question raised. Is the name of the issuer always meant to be equal to the subject company? Because this is the foundation upon which the whole mapping rests (we scrape the cik of the subject company, not the filer, assuming logically that that is the cik corresponding to the issuer). What is going on in cases like this?
To summarise what I have above so far, I had a look at the 260 odd cases for which an 8-character cusip did not map to a corresponding 9-character one. I defined them in the dataframe unique_8_df
. I initially found that, apart from 15 entries where the cusip was equal to SCHEDULE
, that the entries for the cusips seemed like they could be legitimate cusips, at least from first glance.
Then I looked at the multiplicities of entries for each cik. So there were a few with more than one cusip6 matched to each cik in this set, with the particularly large outlier of cik = 903954
. I then tried matching to stocknames and cusipm.issuer, and it seems that, though some of these cases could be matched by company names, there a quite a few wrong matches by company name, so these perhaps should be chucked out.
I then tried doing a number of joins to stocknames and cusipm.issuer with the cases which had a single cusip6. As you can see from the initial joins, a substantial number do join. I then looked through the sets, and picked out, in both cases, what I deem "soft matches" (not a hard match, up to acronyms, but cases where perhaps the main name of a firm exists in each (e.g MORGAN GRENFELL), with perhaps different company abbreviations (ie. maybe INC instead of CORP, or perhaps INC is missing altogether)), and then the cases where the names, at a glance, clearly don't match (from which I found and investigated the amusing example above). These are the cases we should perhaps keep, assuming the cusip is correct despite in some cases the names not matching.
In addition to the interesting case above, though, there is this case
> issuers %>% filter(grepl('^HOSOI', issuer_name_1)) %>% select(cusip6, issuer_check, issuer_name_1)
# A tibble: 1 x 3
cusip6 issuer_check issuer_name_1
<chr> <chr> <chr>
1 441047 8 HOSOI GARDEN MORTUARY INC
> cusip_cik %>% filter(cik == 48544) %>% select(file_name, cusip, check_digit, cik, company_name, formats, cusip6)
# A tibble: 2 x 7
file_name cusip check_digit cik company_name formats cusip6
<chr> <chr> <int> <int> <chr> <chr> <chr>
1 edgar/data/48544/0000950150-96-000080.txt 44090300 3 48544 HOSOI GARDEN MORTUARY INC AC 440903
2 edgar/data/1007155/0000950150-96-000080.txt 44090300 3 48544 HOSOI GARDEN MORTUARY INC AC 440903
where clearly the wrong cusip has been matched. This could very well be a typo, though it would be hard to turn 441047 into 440903 by just typing. Could 4409030011
be an old cusip? Do stocknames and cusipm.issuer contain all previously assigned cusips, or just those in current existence?
@bdcallen I can't really follow what you are doing above. Your code snippets are incomplete. For complete code, the best way is to use RStudio on your own computer and the reprex::reprex()
function to render code on the clipboard, which can then be pasted here.
Sys.setenv(PGHOST = "10.101.13.99", PGDATABASE="crsp")
library(dplyr, warn.conflicts = FALSE)
library(DBI)
pg <- dbConnect(RPostgres::Postgres())
rs <- dbExecute(pg, "SET work_mem = '5GB'")
rs <- dbExecute(pg, "SET search_path TO edgar")
stocknames <- tbl(pg, sql("SELECT * FROM crsp.stocknames"))
cusip_cik <- tbl(pg, "cusip_cik_test")
cusip_9 <-
cusip_cik %>%
filter(nchar(cusip)==9L,
right(cusip, 1L) == as.character(check_digit)) %>%
group_by(cik, cusip) %>%
summarize(num_filings = n()) %>%
ungroup() %>%
rename(cusip9 = cusip) %>%
mutate(cusip = substr(cusip9, 1L, 8L)) %>%
compute()
cusip_8 <-
cusip_cik %>%
filter(nchar(cusip)==8L) %>%
group_by(cik, cusip) %>%
summarize(num_filings = n()) %>%
ungroup() %>%
compute()
cusip_8_extra <-
cusip_8 %>%
anti_join(cusip_9, by = c("cik", "cusip")) %>%
arrange(desc(num_filings)) %>%
compute()
cusip_8_extra %>%
count()
#> # Source: lazy query [?? x 1]
#> # Database: postgres [igow@10.101.13.99:5432/crsp]
#> n
#> <int64>
#> 1 1828
cusip_8_extra %>%
filter(num_filings >= 10) %>%
count()
#> # Source: lazy query [?? x 1]
#> # Database: postgres [igow@10.101.13.99:5432/crsp]
#> n
#> <int64>
#> 1 198
cusip_8_extra %>%
filter(num_filings >= 10)
#> # Source: lazy query [?? x 3]
#> # Database: postgres [igow@10.101.13.99:5432/crsp]
#> # Ordered by: desc(num_filings)
#> cik cusip num_filings
#> <int> <chr> <int64>
#> 1 948642 87927W10 424
#> 2 811419 02355926 86
#> 3 897448 02311107 70
#> 4 818813 91736108 60
#> 5 320121 8796B200 60
#> 6 1422109 09548108 56
#> 7 1297401 26153103 54
#> 8 1175872 89462609 54
#> 9 922475 4506B109 54
#> 10 726601 13974105 52
#> # … with more rows
Created on 2020-06-19 by the reprex package (v0.3.0)
I don't think we care about CUSIP-CIK combinations that don't appear in a number of filings. I think a minimum of about 10 works best, as this eliminates a lot of errors that show up in one or two filings.
So I think we have 198 combinations to examine. Looking at the first one, I'd say it's a valid match:
Sys.setenv(PGHOST = "10.101.13.99", PGDATABASE="crsp")
library(dplyr, warn.conflicts = FALSE)
library(DBI)
pg <- dbConnect(RPostgres::Postgres())
rs <- dbExecute(pg, "SET work_mem = '5GB'")
rs <- dbExecute(pg, "SET search_path TO edgar")
stocknames <- tbl(pg, sql("SELECT * FROM crsp.stocknames"))
cusip_cik <- tbl(pg, "cusip_cik_test")
cusip_cik %>%
filter(cusip %~% '^87927W') %>%
count(cusip, cik, company_name)
#> # Source: lazy query [?? x 4]
#> # Database: postgres [igow@10.101.13.99:5432/crsp]
#> # Groups: cusip, cik
#> cusip cik company_name n
#> <chr> <int> <chr> <int64>
#> 1 87927W10 948642 TELECOM ITALIA S P A 398
#> 2 87927W10 948642 TIM S.p.A. 26
#> 3 87927W10 1200598 TELECOM ITALIA SPA 2
cusip_cik %>%
filter(cik == 948642) %>%
count(cusip)
#> # Source: lazy query [?? x 2]
#> # Database: postgres [igow@10.101.13.99:5432/crsp]
#> cusip n
#> <chr> <int64>
#> 1 003497168 2
#> 2 87927W10 424
#> 3 87927Y201 8
#> 4 T92778108 8
#> 5 T92778124 12
#> 6 <NA> 29
stocknames %>%
filter(ncusip %~% '^87927W') %>%
select(permno, ncusip, comnam)
#> # Source: lazy query [?? x 3]
#> # Database: postgres [igow@10.101.13.99:5432/crsp]
#> permno ncusip comnam
#> <int> <chr> <chr>
#> 1 81781 87927W10 TELECOM ITALIA SPA
#> 2 81782 87927W20 TELECOM ITALIA SPA
Created on 2020-06-19 by the reprex package (v0.3.0)
The next one is a bit more complicated. I'd guess that 02355926
is not a valid CUSIP. So we don't want it.
Sys.setenv(PGHOST = "10.101.13.99", PGDATABASE="crsp")
library(dplyr, warn.conflicts = FALSE)
library(DBI)
pg <- dbConnect(RPostgres::Postgres())
rs <- dbExecute(pg, "SET work_mem = '5GB'")
rs <- dbExecute(pg, "SET search_path TO edgar")
stocknames <- tbl(pg, sql("SELECT * FROM crsp.stocknames"))
cusip_cik <- tbl(pg, "cusip_cik_test")
issue <- tbl(pg, sql("SELECT * FROM cusipm.issue"))
cik <- 948642
cusip <- "02355926"
cusip6 <- paste0('^', substr(cusip, 1, 6))
cusip_cik %>%
filter(cusip %~% cusip6) %>%
count(cusip, cik, company_name)
#> # Source: lazy query [?? x 4]
#> # Database: postgres [igow@10.101.13.99:5432/crsp]
#> # Groups: cusip, cik
#> cusip cik company_name n
#> <chr> <int> <chr> <int64>
#> 1 023559206 811419 AMERALIA INC 6
#> 2 02355926 811419 AMERALIA INC 80
#> 3 02355926 811419 Natural Resources USA Corp 6
cusip_cik %>%
filter(cik == cik) %>%
count(cusip)
#> # Source: lazy query [?? x 2]
#> # Database: postgres [igow@10.101.13.99:5432/crsp]
#> cusip n
#> <chr> <int64>
#> 1 000000 2
#> 2 00000000 12
#> 3 000000000 50
#> 4 000015228 2
#> 5 000020671 6
#> 6 0000206F1 6
#> 7 000022551 48
#> 8 000049331 2
#> 9 000050811 2
#> 10 00006425 6
#> # … with more rows
stocknames %>%
filter(ncusip %~% cusip6) %>%
select(permno, ncusip, comnam)
#> # Source: lazy query [?? x 3]
#> # Database: postgres [igow@10.101.13.99:5432/crsp]
#> permno ncusip comnam
#> <int> <chr> <chr>
#> 1 11642 02355910 AMERALIA INC
#> 2 11642 02355920 AMERALIA INC
#> 3 11642 02355920 AMERALIA INC
issue %>% filter(issuer_num == cusip6)
#> # Source: lazy query [?? x 69]
#> # Database: postgres [igow@10.101.13.99:5432/crsp]
#> # … with 69 variables: issuer_num <chr>, issue_num <chr>, issue_check <chr>,
#> # issue_desc_1 <chr>, issue_desc_2 <chr>, issue_adl_1 <chr>,
#> # issue_adl_2 <chr>, issue_adl_3 <chr>, issue_adl_4 <chr>,
#> # issue_status <chr>, dated_date <date>, maturity_date <date>,
#> # partial_maturity <dbl>, rate <dbl>, issue_del_date <date>,
#> # issue_transaction <chr>, issue_update_date <date>, cusip8 <chr>,
#> # alternative_min_tax <chr>, bank_q <chr>, callable <chr>,
#> # activity_date <date>, first_coupon_date <date>, init_pub_off <chr>,
#> # payment_frequency <chr>, currency_code <chr>, domicile_code <chr>,
#> # underwriter <chr>, us_cfi_code <chr>, closing_date <date>,
#> # ticker_symbol <chr>, iso_cfi <chr>, depos_eligible <chr>, pre_refund <chr>,
#> # refundable <chr>, remarketed <chr>, sinking_fund <chr>, taxable <chr>,
#> # form <chr>, enhancements <chr>, fund_distrb_policy <chr>,
#> # fund_inv_policy <chr>, fund_type <chr>, guarantee <chr>, income_type <chr>,
#> # insured_by <chr>, ownership_restr <chr>, payment_status <chr>,
#> # preferred_type <chr>, putable <chr>, rate_type <chr>, redemption <chr>,
#> # source_doc <chr>, sponsoring <chr>, voting_rights <chr>,
#> # warrant_assets <chr>, warrant_status <chr>, warrant_type <chr>,
#> # where_traded <chr>, auditor <chr>, paying_agent <chr>, tender_agent <chr>,
#> # xfer_agent_id <chr>, bond_counsel <chr>, financial_advisor <chr>,
#> # competitive_sale_date <date>, sale_type <chr>, offering_amount <dbl>,
#> # offering_amount_code <chr>
Created on 2020-06-19 by the reprex package (v0.3.0)
It seems we have 000000000
as a CUSIP for the previous case. We should probably kill these (obviously invalid) CUSIPs before populating the table.
So I have redefined cusip_cik to have a multiplicity variable.
cusip_cik <- cusip_cik %>%
group_by(cik, cusip) %>%
summarise(multiplicity = n()) %>%
inner_join(cusip_cik) %>% ungroup()
cusip_cik <- cusip_cik %>%
group_by(cik, cusip6) %>%
summarise(n_cusip6 = n()) %>%
inner_join(cusip_cik) %>% ungroup()
has_cusip9_match <- cusip_cik %>% filter(cusip_length == 8) %>%
inner_join(cusip_cik, by = 'cik') %>%
filter(cusip_length.y == 9) %>%
filter(cusip.x == substr(cusip.y, 1, 8)) %>% distinct(cik, cusip.x) %>%
rename(cusip = cusip.x)
has_cusip9_match <- has_cusip9_match %>%
inner_join(cusip_cik) %>%
distinct(cik, cusip, cusip6, multiplicity, n_cusip6)
unique_8_df <- cusip_cik %>% filter(cusip_length == 8) %>%
anti_join(has_cusip9_match, by = c('cik', 'cusip')) %>%
distinct(cik, cusip) %>% mutate(cusip6 = substr(cusip, 1, 6))
unique_8_df <- unique_8_df %>%
inner_join(cusip_cik) %>%
distinct(cik, cusip, cusip6, multiplicity, n_cusip6)
> unique_8_df %>% mutate(ncusip = cusip) %>%
inner_join(stocknames) %>% filter(multiplicity >= 10) %>%
distinct(cik, cusip) %>% head(10)
Joining, by = c("cusip", "ncusip")
# A tibble: 10 x 2
cik cusip
<int> <chr>
1 26537 23371210
2 45791 70355510
3 97483 71891320
4 99982 74100410
5 99982 74100420
6 723616 28511410
7 729213 74791410
8 737033 74835710
9 750556 22463310
10 750556 22463320
> unique_8_df %>% mutate(ncusip = cusip) %>% inner_join(stocknames) %>%
filter(multiplicity >= 10) %>%
distinct(cik, cusip) %>% tail(14)
Joining, by = c("cusip", "ncusip")
# A tibble: 14 x 2
cik cusip
<int> <chr>
1 750556 69149730
2 750556 80653010
3 750556 89235610
4 791164 42219B10
5 812482 80190410
6 829499 46072310
7 854873 00760G10
8 882264 03852L10
9 904081 63577110
10 904900 08372L10
11 948642 87927W10
12 1013050 74257410
13 1093367 02932810
14 1125259 14365830
> stockn_or_issuer <- bind_rows(issuer_matches, stockname_matches %>%
anti_join(issuer_matches)) %>%
arrange(cik, cusip, cusip6)
Joining, by = c("cik", "cusip", "cusip6", "multiplicity", "n_cusip6", "company_name")
> stockn_or_issuer
# A tibble: 1,181 x 8
cik cusip cusip6 multiplicity n_cusip6 company_name issuer_name_1 comnam
<int> <chr> <chr> <int> <int> <chr> <chr> <chr>
1 2135 48625010 486250 4 4 AFFILIATED COMPUTER SERVICES INC KAUFMANN FD INC NA
2 2491 36465410 364654 16 16 ALLIANCE GAMING CORP GAMING & TECHNOLOGY INC NA
3 3327 01306801 013068 2 326 ALBERTO CULVER CO ALBERTO CULVER CO NA
4 3941 01941107 019411 6 42 ALLIED PRODUCTS CORP /DE/ ALLIED PRODS CORP DEL NA
5 3952 19483106 194831 2 2 ALLIED RESEARCH CORP COLLINS & AIKMAN GROUP INC NA
6 4457 02359100 023591 6 6 AMERCO /NV/ AMERCO INC NA
7 4707 02489810 024898 2 2 AMERICAN CAPITAL CORP AMERICAN CAP CORP NA
8 5009 02604201 026042 4 22 AMERICAN FILTRONA CORP AMERICAN FILTRONA CORP NA
9 5177 23426703 234267 2 2 AMDURA CORP DAL PETE CO NA
10 5981 30371108 303711 2 2 AMERICAN VANGUARD CORP FAIRCHILD INDS INC NA
# … with 1,171 more rows
> stockn_or_issuer %>% filter(multiplicity >= 10)
# A tibble: 116 x 8
cik cusip cusip6 multiplicity n_cusip6 company_name issuer_name_1 comnam
<int> <chr> <chr> <int> <int> <chr> <chr> <chr>
1 2491 36465410 364654 16 16 ALLIANCE GAMING CORP GAMING & TECHNOLOGY INC NA
2 8038 04635208 046352 11 11 ASTREX INC ASTRALIS LTD NA
3 16496 13152106 131521 72 72 CALPROP CORP CALVERT CASH RESVS NA
4 26537 23371210 233712 12 12 DAEDALUS ENTERPRISES INC DAEDALUS ENTERPRISES INC NA
5 33073 29403700 294037 22 94 VISKASE COMPANIES INC ENVIRODYNE INDS INC NA
6 45791 70355510 703555 10 36 HARRIS PAUL STORES INC PAUL HARRIS STORES INC NA
7 57139 5024709C 502470 26 52 LABARGE INC LA BARGE INC NA
8 69067 62632010 626320 18 38 MUNSINGWEAR INC MUNSINGWEAR INC NA
9 69067 62632010 626320 18 38 PREMIUMWEAR INC MUNSINGWEAR INC NA
10 70415 63713010 637130 10 20 NATIONAL PATENT DEVELOPMENT CORP NATIONAL PATENT DEV CORP NA
# … with 106 more rows
> stockn_or_issuer %>% filter(n_cusip6 >= 10)
# A tibble: 382 x 8
cik cusip cusip6 multiplicity n_cusip6 company_name issuer_name_1 comnam
<int> <chr> <chr> <int> <int> <chr> <chr> <chr>
1 2491 36465410 364654 16 16 ALLIANCE GAMING CORP GAMING & TECHNOLOGY INC NA
2 3327 01306801 013068 2 326 ALBERTO CULVER CO ALBERTO CULVER CO NA
3 3941 01941107 019411 6 42 ALLIED PRODUCTS CORP /DE/ ALLIED PRODS CORP DEL NA
4 5009 02604201 026042 4 22 AMERICAN FILTRONA CORP AMERICAN FILTRONA CORP NA
5 6260 03237116 032371 4 88 ANACOMP INC ANACOMP INC IND NA
6 8038 04635208 046352 11 11 ASTREX INC ASTRALIS LTD NA
7 8497 04621M02 04621M 4 18 ASSURANCEAMERICA CORP ASSURANCEAMERICA CORP NA
8 12779 09600513 096005 4 34 BLUE RIDGE REAL ESTATE CO BLUE RIDGE REAL ESTATE CO NA
9 14637 11522300 115223 1 51 BROWN & SHARPE MANUFACTURING CO /DE/ BROWN & SHARPE MFG CO NA
10 14803 11566021 115660 2 136 BROWN TOM INC /DE BROWN TOM INC NA
# … with 372 more rows
@bdcallen I think what you need to do here is to investigate some more of the 289 CUSIP-CIK combinations I identified above. I examined two of these; I think we might want to do ~10-12 in total. Once you've investigated the issues we can figure out what to do with them.
@iangow I see the essential difference between our analyses now. I previously selected for 8-digit cusips which under inner joining back onto cusip_cik by cik, did not have any other record that had a 9 digit cusip, for that cik. Whereas, you have more or less done the same thing, but with the additional constraint of matching the 8-digit cusip to the first 8 digits of a record with a 9-digit cusip (I essentially repeated the approach in the post just above). Hence, the great disparity in the numbers of 8-digit cusips to look at before taking the multiplicities of the cik-cusip combinations into account (245 versus around 3000). After imposing a multiplicity of 10 or above, the number I had reduces to 20
has_cusip9_any <- cusip_cik %>% filter(cusip_length == 8) %>%
inner_join(cusip_cik, by = 'cik') %>%
filter(cusip_length.y == 9) %>% distinct(cik, cusip.x) %>%
rename(cusip = cusip.x)
eight_no_cusip9_df <- cusip_cik %>% filter(cusip_length == 8) %>%
anti_join(has_cusip9_any, by = c('cik', 'cusip')) %>%
distinct(cik, cusip) %>% mutate(cusip6 = substr(cusip, 1, 6))
eight_no_cusip9_df <- eight_no_cusip9_df %>% inner_join(cusip_cik) %>% distinct(cik, cusip, cusip6, multiplicity, n_cusip6)
> eight_no_cusip9_df %>% filter(multiplicity >= 10)
# A tibble: 20 x 5
cik cusip cusip6 multiplicity n_cusip6
<int> <chr> <chr> <int> <int>
1 74154 68267810 682678 18 18
2 97483 71891320 718913 20 20
3 719264 31946210 319462 15 15
4 729213 74791410 747914 12 12
5 737033 74835710 748357 14 16
6 754128 06859200 068592 20 20
7 777844 20992910 209929 12 14
8 842180 05959110 059591 12 12
9 882264 03852L10 03852L 18 18
10 895742 9900307A 990030 10 10
11 904081 63577110 635771 12 12
12 904900 08372L10 08372L 14 14
13 924835 88871110 888711 10 10
14 1010566 6911N103 6911N1 34 34
15 1013050 74257410 742574 10 10
16 1038370 8716Y107 8716Y1 10 10
17 1064133 98144Q10 98144Q 12 12
18 1093367 02932810 029328 12 12
19 1113227 90337P10 90337P 34 34
20 1520358 57660106 576601 40 40
I think I made the initial join that way, on the assumption that any 9-digit cusip candidate is preferable for a given cik than an 8-digit one. Also, if an 8 digit cusip is essentially a 9-digit one, but with say the 4th digit omitted by mistake, that will not be picked up when we anti-join only the 9-digit cases where the first 8 digits match the 8-digit cusip.
Here, in my more recent version of unique_8_df
, I had 287 candidates after selecting for the required multiplicity
> unique_8_df %>% filter(multiplicity >= 10)
# A tibble: 287 x 5
cik cusip cusip6 multiplicity n_cusip6
<int> <chr> <chr> <int> <int>
1 2491 36465410 364654 16 16
2 3545 01630104 016301 12 12
3 5133 02375204 023752 12 12
4 6281 32654105 326541 10 10
5 6292 SEECOVER SEECOV 14 14
6 6715 PAGE2OF4 PAGE2O 12 12
7 8038 04635208 046352 11 11
8 9346 57755100 577551 10 10
9 10119 68509603 685096 14 14
10 11544 84423102 844231 12 12
# … with 277 more rows
Then doing
stockname_matches <- unique_8_df %>% mutate(ncusip = cusip) %>% inner_join(stocknames) %>% distinct(cik, cusip)
issuer_matches <- unique_8_df %>% inner_join(issuers) %>% distinct(cik, cusip)
stockn_or_issuer <- bind_rows(issuer_matches, stockname_matches %>% anti_join(issuer_matches))
> stockn_or_issuer %>% inner_join(cusip_cik) %>%
filter(multiplicity >= 10) %>%
distinct(cusip, cik) %>% count()
Joining, by = c("cik", "cusip")
# A tibble: 1 x 1
n
<int>
1 112
I get that 112 of the 287 candidates can be joined to either stocknames or issuers.
@iangow I guess in some of my recent posts, I have tried to look at how defining how we match 8-digit cusip records to 9-digit ones changes things. For instance, should we eliminate the cases which can join to any 9-digit cusip? Or just those 9-digit cases where the multiplicity is more than or equal to 10?
> cusip8s <- cusip_cik %>% filter(cusip_length == 8)
> cusip8s %>% inner_join(cusip_cik, by = 'cik') %>%
> filter(cusip_length.y == 9) %>% distinct(cik, cusip.x, multiplicity.x) %>%
> rename(cusip = cusip.x, multiplicity = multiplicity.x) %>%
> filter(multiplicity >= 10) %>% count()
# A tibble: 1 x 1
n
<int>
1 764
> cusip8s %>% inner_join(cusip_cik, by = 'cik') %>%
> filter(cusip_length.y == 9 && multiplicity.y >= 10) %>%
> distinct(cik, cusip.x, multiplicity.x) %>%
> rename(cusip = cusip.x, multiplicity = multiplicity.x) %>%
> filter(multiplicity >= 10) %>% count()
# A tibble: 1 x 1
n
<int>
1 784
@iangow Can you explain this to me, especially when we both independently found there to be excess 8-digit cases in other analyses above?
cusip8s <- cusip_cik %>% filter(cusip_length == 8)
eight_w_valid9 <- cusip8s %>%
inner_join(cusip_cik, by = 'cik') %>%
filter(cusip_length.y == 9 && multiplicity.y >= 10) %>%
distinct(cik, cusip.x, multiplicity.x) %>%
rename(cusip = cusip.x, multiplicity = multiplicity.x)
> cusip8s %>% anti_join(eight_w_valid9)
Joining, by = c("cik", "cusip", "multiplicity")
# A tibble: 0 x 20
# … with 20 variables: cik <int>, cusip6 <chr>, n_cusip6 <int>, cusip <chr>, multiplicity <int>, file_name <chr>, check_digit <int>, company_name <chr>,
# formats <chr>, cusip_length <int>, num_cusip_unique_char <dbl>, cusip8 <chr>, cusip7_1 <chr>, cusip6_2 <chr>, cusip5_3 <chr>, cusip4_4 <chr>, cusip3_5 <chr>,
# cusip2_6 <chr>, cusip1_7 <chr>, cusip0_8 <chr>
Oh, I have joined just by cik this time initially, not by file_name and cik (for eight_w_valid9
).
Also, not joining by file_name like I did initially meant many clearly unrelated 8's and 9's were matched. Doing this this time fixed the problem
cusip8s <- cusip_cik %>% filter(cusip_length == 8)
eight_w_valid9 <- cusip8s %>% inner_join(cusip_cik, by = 'cik') %>%
filter(cusip_length.y == 9 & multiplicity.y >= 10
& substr(cusip.y, 9L, 9L) == as.character(check_digit.y)
& substr(cusip.y, 1, 8) == cusip.x) %>%
distinct(cik, cusip.x, multiplicity.x) %>%
rename(cusip = cusip.x, multiplicity = multiplicity.x)
> cusip8s %>% anti_join(eight_w_valid9)
Joining, by = c("cik", "cusip", "multiplicity")
# A tibble: 16,250 x 20
cik cusip6 n_cusip6 cusip multiplicity file_name check_digit company_name formats cusip_length num_cusip_uniqu… cusip8 cusip7_1 cusip6_2 cusip5_3 cusip4_4
<int> <chr> <int> <chr> <int> <chr> <int> <chr> <chr> <int> <dbl> <chr> <chr> <chr> <chr> <chr>
1 1923 037881 2 0378… 2 edgar/da… 7 ABRAMS INDU… C 8 6 03788… 0378810 0378816 0378806 0378106
2 1923 037881 2 0378… 2 edgar/da… 7 ABRAMS INDU… C 8 6 03788… 0378810 0378816 0378806 0378106
3 1961 98159J 5 9815… 1 edgar/da… 1 Worlds.com,… D 8 6 98159… 98159J1 98159J0 9815910 9815J10
4 2062 004162 4 0041… 4 edgar/da… 9 ACMAT CORP C 8 6 00416… 0041620 0041627 0041607 0041207
5 2062 004162 4 0041… 4 edgar/da… 9 ACMAT CORP C 8 6 00416… 0041620 0041627 0041607 0041207
6 2062 004162 4 0041… 4 edgar/da… 9 ACMAT CORP C 8 6 00416… 0041620 0041627 0041607 0041207
7 2062 004162 4 0041… 4 edgar/da… 9 ACMAT CORP C 8 6 00416… 0041620 0041627 0041607 0041207
8 2093 004724 4 0047… 2 edgar/da… 0 ACME METALS… D 8 5 00472… 0047241 0047240 0047210 0047410
9 2093 004724 4 0047… 2 edgar/da… 0 ACME METALS… D 8 5 00472… 0047241 0047240 0047210 0047410
10 2135 486250 4 4862… 4 edgar/da… 3 AFFILIATED … C 8 7 48625… 4862501 4862500 4862510 4862010
# … with 16,240 more rows, and 4 more variables: cusip3_5 <chr>, cusip2_6 <chr>, cusip1_7 <chr>, cusip0_8 <chr>
> cusip8s %>% anti_join(eight_w_valid9) %>% distinct(cusip, cik) %>% count()
Joining, by = c("cik", "cusip", "multiplicity")
# A tibble: 1 x 1
n
<int>
1 3485
> cusip8s %>% anti_join(eight_w_valid9) %>%
+ filter(multiplicity >= 10) %>% distinct(cusip, cik) %>% count()
Joining, by = c("cik", "cusip", "multiplicity")
# A tibble: 1 x 1
n
<int>
1 335
Will repeat the other steps
OK. Please replace comments above that are erroneous so that when finalize, we only have what's relevant.
> match_stocknames <- cusip8s %>% anti_join(eight_w_valid9) %>%
filter(multiplicity >= 10) %>%
distinct(cusip, cik) %>% mutate(ncusip = cusip) %>%
inner_join(stocknames) %>% distinct(cusip, cik)
Joining, by = c("cik", "cusip", "multiplicity")
Joining, by = c("cusip", "ncusip")
> match_stocknames %>% count()
# A tibble: 1 x 1
n
<int>
1 48
> match_issuers <- cusip8s %>% anti_join(eight_w_valid9) %>%
filter(multiplicity >= 10) %>% distinct(cusip, cik) %>%
mutate(cusip6 = substr(cusip, 1, 6)) %>%
inner_join(issuers) %>% distinct(cusip, cik)
> match_issuers %>% count()
# A tibble: 1 x 1
n
<int>
1 157
match_either <- bind_rows(match_issuers, match_stocknames %>%
anti_join(match_issuers, by = c('cik', 'cusip')))
> match_either %>% count()
# A tibble: 1 x 1
n
<int>
1 159
> 100.00 * 159/335
[1] 47.46269
> match_either %>% inner_join(cusip_cik, by = 'cik') %>%
+ filter(cusip_length == 9 & multiplicity.y < 10 &
+ multiplicity.x + multiplicity.y >= 10 &
+ substr(cusip.y, 9L, 9L) == as.character(check_digit) &
+ substr(cusip.y, 1, 8) == cusip.x) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 46
So I had 335, as opposed to 289, cases due to the constraint of having the 9-digit cusips have a multiplicity of at least 10. After fixing the AND operator mistake, just under half of the cases can be joined to either crsp.stocknames
or cusipm.issuer
(represented by match_either
).
@iangow Here's a look at the first few entries with names for those cases that match cusipm.issuer
(note I have changed the variable issuer_num
to cusip6
is issuers
all way through my analysis for convenience)
> match_issuers %>% left_join(issuers) %>% left_join(ciks) %>% distinct(cik, cusip, cusip6, issuer_name_1, company_name, issuer_adl_1, issuer_adl_2, issuer_type) %>% head(20)
Joining, by = "cusip6"
Joining, by = "cik"
# A tibble: 20 x 8
cik cusip cusip6 issuer_name_1 company_name issuer_adl_1 issuer_adl_2 issuer_type
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2491 36465410 364654 GAMING & TECHNOLOGY INC ALLIANCE GAMING CORP NAME CHANGED TO UNITED GAMING INC 12/08/1988 SEE 910475 C
2 2491 36465410 364654 GAMING & TECHNOLOGY INC BALLY TECHNOLOGIES, INC. NAME CHANGED TO UNITED GAMING INC 12/08/1988 SEE 910475 C
3 8038 04635208 046352 ASTRALIS LTD ASTREX INC FORMERLY ASTRALIS PHARMACEUTICALS LTD 12/10/2001 C
4 16496 13152106 131521 CALVERT CASH RESVS CALPROP CORP NAME CHANGED TO MONEY MGMT PLUS 09/10/1984 SEE 609918 C
5 26537 23371210 233712 DAEDALUS ENTERPRISES INC ARGON ST, INC. NAME CHANGED TO SENSYS TECHNOLOGIES INC 06/09/1998 C
6 26537 23371210 233712 DAEDALUS ENTERPRISES INC DAEDALUS ENTERPRISES INC NAME CHANGED TO SENSYS TECHNOLOGIES INC 06/09/1998 C
7 26537 23371210 233712 DAEDALUS ENTERPRISES INC SENSYS TECHNOLOGIES INC NAME CHANGED TO SENSYS TECHNOLOGIES INC 06/09/1998 C
8 26537 23371210 233712 DAEDALUS ENTERPRISES INC SENSYTECH INC NAME CHANGED TO SENSYS TECHNOLOGIES INC 06/09/1998 C
9 33073 29403700 294037 ENVIRODYNE INDS INC ENVIRODYNE INDUSTRIES INC FORMERLY ENVIRODYNE INC TO 03/03/1978 AND NAME CHANGED TO C
10 33073 29403700 294037 ENVIRODYNE INDS INC VISKASE COMPANIES INC FORMERLY ENVIRODYNE INC TO 03/03/1978 AND NAME CHANGED TO C
11 38264 34986210 349862 FORWARD INDS INC N Y FORWARD INDUSTRIES INC NA NA C
12 38264 34986210 349862 FORWARD INDS INC N Y FORWARD INDUSTRIES, INC. NA NA C
13 45791 70355510 703555 PAUL HARRIS STORES INC HARRIS PAUL STORES INC NA NA C
14 45791 70355510 703555 PAUL HARRIS STORES INC PAUL HARRIS STORES INC NA NA C
15 57139 5024709C 502470 LA BARGE INC LABARGE INC NA NA C
16 59963 25474510 254745 DISTINCTIVE DEVICES INC DISTINCTIVE DEVICES INC NA NA C
17 65270 59152010 591520 METHODE ELECTRS INC METHODE ELECTRONICS INC NA NA C
18 69067 62632010 626320 MUNSINGWEAR INC MUNSINGWEAR INC NAME CHANGED TO PREMIUMWEAR INC 09/06/1996 SEE 740909 C
19 69067 62632010 626320 MUNSINGWEAR INC PREMIUMWEAR INC NAME CHANGED TO PREMIUMWEAR INC 09/06/1996 SEE 740909 C
20 70415 63713010 637130 NATIONAL PATENT DEV CORP GP STRATEGIES CORP NAME CHANGED TO GP STRATEGIES CORP 03/09/1998 SEE 36225V C
Seems like the first entry above is wrong.
> issuers %>% filter(grepl("^ALLIANCE GAMING CORP", issuer_name_1)) %>% mutate(issuer_name = paste0(na.omit(issuer_name_1, issuer_name_2, issuer_name_3), collapse = ''), issuer_adl = paste0(na.omit(issuer_adl_1, issuer_adl_2, issuer_adl_3), collapse = '')) %>% select(cusip6, issuer_name, issuer_adl, issuer_status)
# A tibble: 1 x 4
cusip6 issuer_name issuer_adl issuer_status
<chr> <chr> <chr> <chr>
1 01859P ALLIANCE GAMING CORP NAME CHANGED TO BALLY D
> issuers %>% filter(cusip6 == '05874B') %>% mutate(issuer_name = paste0(na.omit(issuer_name_1, issuer_name_2, issuer_name_3), collapse = ''), issuer_adl = paste0(na.omit(issuer_adl_1, issuer_adl_2, issuer_adl_3), collapse = '')) %>% select(cusip6, issuer_name, issuer_adl, issuer_status)
# A tibble: 1 x 4
cusip6 issuer_name issuer_adl issuer_status
<chr> <chr> <chr> <chr>
1 05874B BALLY TECHNOLOGIES INC "" A
> issuers %>% filter(grepl("^GAMING & TECHNOLOGY INC", issuer_name_1)) %>% mutate(issuer_name = paste0(na.omit(issuer_name_1, issuer_name_2, issuer_name_3)), issuer_adl = paste0(na.omit(issuer_adl_1, issuer_adl_2, issuer_adl_3))) %>% select(cusip6, issuer_name, issuer_adl, issuer_status)
# A tibble: 1 x 4
cusip6 issuer_name issuer_adl issuer_status
<chr> <chr> <chr> <chr>
1 364654 GAMING & TECHNOLOGY INC NAME CHANGED TO UNITED GAMING D
> match_issuers %>% left_join(issuers) %>% left_join(ciks) %>% distinct(cik, cusip, cusip6, issuer_name_1, company_name, issuer_adl_1, issuer_adl_2, issuer_type) %>% filter(company_name == issuer_name_1)
Joining, by = "cusip6"
Joining, by = "cik"
# A tibble: 41 x 8
cik cusip cusip6 issuer_name_1 company_name issuer_adl_1 issuer_adl_2 issuer_type
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 26537 23371210 233712 DAEDALUS ENTERPRISES INC DAEDALUS ENTERPRISES INC NAME CHANGED TO SENSYS TECHNOLOGIES INC 06/09/1998 C
2 45791 70355510 703555 PAUL HARRIS STORES INC PAUL HARRIS STORES INC NA NA C
3 59963 25474510 254745 DISTINCTIVE DEVICES INC DISTINCTIVE DEVICES INC NA NA C
4 69067 62632010 626320 MUNSINGWEAR INC MUNSINGWEAR INC NAME CHANGED TO PREMIUMWEAR INC 09/06/1996 SEE 740909 C
5 74154 68267810 682678 ONEOK INC ONEOK INC REORGANIZED AS ONEOK INC NEW 11/28/1997 C
6 216039 40009510 400095 GRUBB & ELLIS CO GRUBB & ELLIS CO NA NA C
7 319156 77742900 777429 ROSELAND OIL & GAS INC ROSELAND OIL & GAS INC REORGANIZED AS CUBIC ENERGY INC 10/04/1999 SEE 229675 C
8 720851 64107410 641074 NESTOR INC NESTOR INC NA NA C
9 729213 74791410 747914 Q MED INC Q MED INC NA NA C
10 737033 74835710 748357 QUESTECH INC QUESTECH INC NA NA C
# … with 31 more rows
> match_issuers %>% left_join(issuers) %>% left_join(ciks) %>% distinct(cik, cusip, cusip6, issuer_name_1, company_name, issuer_adl_1, issuer_adl_2, issuer_type) %>% filter(company_name == issuer_name_1) %>% distinct(cusip, cik) %>% count()
Joining, by = "cusip6"
Joining, by = "cik"
# A tibble: 1 x 1
n
<int>
1 41
So 41 of the of the 159 in match_issuers
can be called definitive matches by company_name == issuer_name_1
. This provides a lower bound on how good the matches in match_issuers
are (it could be better than this, knowing how inconsistent different databases can be in writing the name of some company).
@iangow So I've done this (getting rid of spaces in both names got 3 or 4 more matches)
> View(match_issuers %>%
anti_join(match_issuers %>%
left_join(issuers) %>%
left_join(ciks) %>%
distinct(cik, cusip, cusip6, issuer_name_1,
company_name, issuer_adl_1, issuer_adl_2, issuer_type) %>%
mutate(collapsed_issuer_name = gsub(' ', '', issuer_name_1),
collapsed_company_name = gsub(' ', '', company_name)) %>%
filter(collapsed_issuer_name == collapsed_company_name) %>%
distinct(cusip, cik, company_name, issuer_name_1)) %>%
left_join(issuers) %>% left_join(ciks) %>%
distinct(cik, cusip, cusip6, issuer_name_1, company_name, issuer_adl_1, issuer_adl_2, issuer_type))
cik | cusip | cusip6 | issuer_name_1 | company_name | issuer_adl_1 | issuer_adl_2 | issuer_type | |
---|---|---|---|---|---|---|---|---|
1 | 2491 | 36465410 | 364654 | GAMING & TECHNOLOGY INC | ALLIANCE GAMING CORP | NAME CHANGED TO UNITED GAMING | INC 12/08/1988 SEE 910475 | C |
2 | 2491 | 36465410 | 364654 | GAMING & TECHNOLOGY INC | BALLY TECHNOLOGIES, INC. | NAME CHANGED TO UNITED GAMING | INC 12/08/1988 SEE 910475 | C |
3 | 8038 | 04635208 | 046352 | ASTRALIS LTD | ASTREX INC | FORMERLY ASTRALIS | PHARMACEUTICALS LTD 12/10/2001 | C |
4 | 16496 | 13152106 | 131521 | CALVERT CASH RESVS | CALPROP CORP | NAME CHANGED TO MONEY MGMT | PLUS 09/10/1984 SEE 609918 | C |
5 | 33073 | 29403700 | 294037 | ENVIRODYNE INDS INC | ENVIRODYNE INDUSTRIES INC | FORMERLY ENVIRODYNE INC TO | 03/03/1978 AND NAME CHANGED TO | C |
6 | 33073 | 29403700 | 294037 | ENVIRODYNE INDS INC | VISKASE COMPANIES INC | FORMERLY ENVIRODYNE INC TO | 03/03/1978 AND NAME CHANGED TO | C |
7 | 38264 | 34986210 | 349862 | FORWARD INDS INC N Y | FORWARD INDUSTRIES INC | NA | NA | C |
8 | 38264 | 34986210 | 349862 | FORWARD INDS INC N Y | FORWARD INDUSTRIES, INC. | NA | NA | C |
9 | 65270 | 59152010 | 591520 | METHODE ELECTRS INC | METHODE ELECTRONICS INC | NA | NA | C |
10 | 70415 | 63713010 | 637130 | NATIONAL PATENT DEV CORP | GP STRATEGIES CORP | NAME CHANGED TO GP STRATEGIES | CORP 03/09/1998 SEE 36225V | C |
11 | 70415 | 63713010 | 637130 | NATIONAL PATENT DEV CORP | NATIONAL PATENT DEVELOPMENT CORP | NAME CHANGED TO GP STRATEGIES | CORP 03/09/1998 SEE 36225V | C |
12 | 72843 | 66526286 | 665262 | NORTHERN IND PUB SVC CO | NORTHERN INDIANA PUBLIC SERVICE CO | NA | NA | C |
13 | 74783 | 68417600 | 684176 | ORANGE-CO INC | ACCEPTANCE INSURANCE COMPANIES INC | NAME CHANGED TO STONERIDGE RES | INC 04/22/1987 SEE 861839 | C |
14 | 89140 | 81773232 | 817732 | SERVOTRONICS INC | SERVOTRONICS INC /DE/ | NA | NA | C |
15 | 97483 | 71891320 | 718913 | PHOENIX RESOURCE COS INC | PHOENIX RESOURCE COMPANIES INC | NA | NA | C |
16 | 99982 | 74100410 | 741004 | PRESIDENTIAL RLTY CORP NEW | TRUST CO OF NEW JERSEY | NA | NA | C |
17 | 99982 | 74100420 | 741004 | PRESIDENTIAL RLTY CORP NEW | TRUST CO OF NEW JERSEY | NA | NA | C |
18 | 100441 | 90212010 | 902120 | TYCO INTL LTD | TYCO INTERNATIONAL LTD | FORMERLY TYCO LABS INC TO | 11/10/1993 & REORGANIZED AS | C |
19 | 105096 | 94356710 | 943567 | WAVEPHORE INC | WAXMAN INDUSTRIES INC | NAME CHANGED TO WAVO CORP | 06/23/1999 SEE 944027 | C |
20 | 203248 | 84402810 | 844028 | SOUTHERN UN CO | SOUTHERN UNION CO | NA | NA | C |
21 | 215403 | 15505106 | 155051 | CENTRAL RR CO N J | CENTRAL RESERVE LIFE CORP | REORGANIZED AS CENTRAL JERSEY | INDS INC 08/15/1979 | C |
22 | 215403 | 15505106 | 155051 | CENTRAL RR CO N J | CERES GROUP INC | REORGANIZED AS CENTRAL JERSEY | INDS INC 08/15/1979 | C |
23 | 704874 | 60520010 | 605200 | MISSION WEST PPTYS | MISSION WEST PROPERTIES/NEW/ | REORGANIZED AS MISSION WEST | PPTYS INC MD 12/30/1998 | C |
24 | 715355 | 40273410 | 402734 | GULFSTREAM AEROSPACE CORP NEW | GULFSTREAM AEROSPACE CORP | NA | NA | C |
25 | 719264 | 31946210 | 319462 | FIRST CTZNS BANCSHARES INC | FIRST CITIZENS BANCSHARES INC /TN/ | NA | NA | C |
26 | 721237 | 65488410 | 654884 | NOBEL ED DYNAMICS INC | NOBEL EDUCATION DYNAMICS INC | NAME CHANGED TO NOBEL LEARNING | CMNTYS INC 12/07/1998 | C |
27 | 721237 | 65488410 | 654884 | NOBEL ED DYNAMICS INC | NOBEL LEARNING COMMUNITIES INC | NAME CHANGED TO NOBEL LEARNING | CMNTYS INC 12/07/1998 | C |
28 | 723616 | 28511410 | 285114 | ELECTRO KINETIC SYS INC | CRYSTAL PROPERTIES HOLDINGS, INC. | NAME CHANGED TO STERLING MEDIA | CAP GROUP INC 07/07/2000 | C |
29 | 723616 | 28511410 | 285114 | ELECTRO KINETIC SYS INC | ELECTRO KINETIC SYSTEMS INC | NAME CHANGED TO STERLING MEDIA | CAP GROUP INC 07/07/2000 | C |
30 | 723616 | 28511410 | 285114 | ELECTRO KINETIC SYS INC | STERLING CAPITAL INVESTMENT GROUP INC | NAME CHANGED TO STERLING MEDIA | CAP GROUP INC 07/07/2000 | C |
31 | 723616 | 28511410 | 285114 | ELECTRO KINETIC SYS INC | STERLING MEDIA CAPITAL GROUP INC | NAME CHANGED TO STERLING MEDIA | CAP GROUP INC 07/07/2000 | C |
32 | 724024 | 28882108 | 288821 | ELLIS GABLES APT BLDG CORP | AMERICAN PHYSICIANS SERVICE GROUP INC | NA | NA | C |
33 | 733289 | 37935510 | 379355 | GLOBAL NAT RES INC | GLOBAL NATURAL RESOURCES INC /NJ/ | NA | NA | C |
34 | 743872 | 43761400 | 437614 | HOMESTAKE MNG CO | HOMESTAKE MINING CO /DE/ | NAME CHANGED TO BARRICK GOLD | CORP 12/04/2001 SEE 067901 | C |
35 | 750556 | 22463310 | 224633 | CRAWFORD & CO | SUNTRUST BANKS INC | NA | NA | C |
36 | 750556 | 22463320 | 224633 | CRAWFORD & CO | SUNTRUST BANKS INC | NA | NA | C |
37 | 750556 | 69149730 | 691497 | OXFORD INDS INC | SUNTRUST BANKS INC | NA | NA | C |
38 | 750556 | 78125810 | 781258 | RUDDICK CORP | SUNTRUST BANKS INC | NAME CHANGED TO HARRIS TEETER | SUPERMARKETS INC 04/02/2012 | C |
39 | 750556 | 80653010 | 806530 | SCHERER HEALTHCARE INC | SUNTRUST BANKS INC | ACQUIRED BY STERICYCLE INC | 01/09/2003 | C |
40 | 750556 | 89235610 | 892356 | TRACTOR SUPPLY CO | SUNTRUST BANKS INC | NA | NA | C |
41 | 757011 | 90329405 | 903294 | U S GULF HLDGS INC | USG CORP | NAME CHANGED TO CANCOM | TECHNOLOGIES INC 04/29/1997 | C |
42 | 771498 | 07159310 | 071593 | BATTLE MTN GOLD CO | BATTLE MOUNTAIN GOLD CO | NA | NA | C |
43 | 774487 | 03077110 | 030771 | AMES W R CO | AMERIWOOD INDUSTRIES INTERNATIONAL CORP | ACQUIRED BY RUCKER CO | NA | C |
44 | 781902 | 94091200 | 940912 | WATERFORD CAP MGMT INC | INTERNATIONAL BALER CORP | REORGANIZED AS CPI PLASTICS | GOUP LTD 09/21/1998 SEE 12615N | C |
45 | 781902 | 94091200 | 940912 | WATERFORD CAP MGMT INC | WASTE TECHNOLOGY CORP | REORGANIZED AS CPI PLASTICS | GOUP LTD 09/21/1998 SEE 12615N | C |
46 | 787849 | 98502114 | 985021 | YARDVILLE NATL BANCORP | YARDVILLE NATIONAL BANCORP | NA | NA | C |
47 | 789879 | 29356300 | 293563 | ENS BIO LOGICALS INC | ENSURGE INC | NAME CHANGED TO ENSCOR INC | 06/03/1987 SEE 29356L | C |
48 | 789879 | 29356300 | 293563 | ENS BIO LOGICALS INC | ISHOPPER COM INC | NAME CHANGED TO ENSCOR INC | 06/03/1987 SEE 29356L | C |
49 | 789879 | 29356300 | 293563 | ENS BIO LOGICALS INC | SUNWALKER DEVELOPMENT INC | NAME CHANGED TO ENSCOR INC | 06/03/1987 SEE 29356L | C |
50 | 791050 | 02882810 | 028828 | AMERICAN PASSAGE MKTG CORP | PHOTOWORKS INC /WA | NAME CHANGED TO SEATTLE | FILMWORKS INC 02/24/1989 | C |
51 | 791050 | 02882810 | 028828 | AMERICAN PASSAGE MKTG CORP | SEATTLE FILMWORKS INC | NAME CHANGED TO SEATTLE | FILMWORKS INC 02/24/1989 | C |
52 | 791164 | 42219B10 | 42219B | HEALTH MGMT INC DEL | HEALTH MANAGEMENT INC/DE | NA | NA | C |
53 | 791164 | 42219B10 | 42219B | HEALTH MGMT INC DEL | HOMECARE MANAGEMENT INC | NA | NA | C |
54 | 792161 | 57751400 | 577514 | MAUNA LOA MACADAMIA PARTNERS | MAUNA LOA MACADAMIA PARTNERS LP | NAME CHANGED TO ML MACADAMIA | ORCHARDS L P 09/16/1998 | C |
55 | 792161 | 57751400 | 577514 | MAUNA LOA MACADAMIA PARTNERS | ML MACADAMIA ORCHARDS L P | NAME CHANGED TO ML MACADAMIA | ORCHARDS L P 09/16/1998 | C |
56 | 792161 | 57751400 | 577514 | MAUNA LOA MACADAMIA PARTNERS | ROYAL HAWAIIAN ORCHARDS, L.P. | NAME CHANGED TO ML MACADAMIA | ORCHARDS L P 09/16/1998 | C |
57 | 804138 | 02711807 | 027118 | AMERICAN LD LEASE INC | AMERICAN LAND LEASE INC | NA | NA | C |
58 | 804138 | 02711807 | 027118 | AMERICAN LD LEASE INC | ASSET INVESTORS CORP | NA | NA | C |
59 | 811641 | 45252306 | 452523 | IMMERSION HUMAN INTERFACE CORP | IMMUCELL CORP /DE/ | NA | NA | C |
60 | 817135 | 08606103 | 086061 | BERTUCCIS CORP | ALLIANCE HEALTHCARE SERVICES, INC | NA | NA | C |
61 | 817135 | 08606103 | 086061 | BERTUCCIS CORP | ALLIANCE IMAGING INC /DE/ | NA | NA | C |
62 | 829499 | 46072310 | 460723 | INTERSTATE BAKERIES CORP NEW | INTERSTATE BAKERIES CORP/DE/ | MERGED INTO INTERSTATE | BRANDS CORP 06/02/1990 | C |
63 | 831532 | 45070710 | 450707 | IVI PUBG INC | ATRIX INTERNATIONAL INC | REORGANIZED AS ONHEALTH | NETWORK CO 06/16/1998 | C |
64 | 833203 | 24231700 | 242317 | DEALS ARE GOOD INC | LINKON CORP INC | NAME CHANGED TO LINKON CORP | 10/06/1990 SEE 535902 | C |
65 | 833203 | 24231700 | 242317 | DEALS ARE GOOD INC | LINKON CORP | NAME CHANGED TO LINKON CORP | 10/06/1990 SEE 535902 | C |
66 | 833203 | 24231700 | 242317 | DEALS ARE GOOD INC | PACKETPORT COM | NAME CHANGED TO LINKON CORP | 10/06/1990 SEE 535902 | C |
67 | 833203 | 24231700 | 242317 | DEALS ARE GOOD INC | POCKETPORT COM | NAME CHANGED TO LINKON CORP | 10/06/1990 SEE 535902 | C |
68 | 833203 | 24231700 | 242317 | DEALS ARE GOOD INC | WYNDSTORM CORP | NAME CHANGED TO LINKON CORP | 10/06/1990 SEE 535902 | C |
69 | 840260 | 53803230 | 538032 | LIVE ENTMT INC | LIVE ENTERTAINMENT INC | NAME CHANGED TO CANCER | TREATMENT CTRS INC 07/30/2002 | C |
70 | 841528 | 75952U10 | 75952U | RELIASTAR FINL CORP | NWNL COMPANIES INC | MERGED INTO ING GROEP N.V. | 09/01/2000 | C |
71 | 841528 | 75952U10 | 75952U | RELIASTAR FINL CORP | RELIASTAR FINANCIAL CORP | MERGED INTO ING GROEP N.V. | 09/01/2000 | C |
72 | 842180 | 05959110 | 059591 | BANCO FRANCES DEL RIO DE LA | BANCO BILBAO VIZCAYA ARGENTARIA S A | NAME CHANGED TO BBVA BANCO | FRANCES S A 10/04/2000 | C |
73 | 842180 | 05959110 | 059591 | BANCO FRANCES DEL RIO DE LA | BANCO BILBAO VIZCAYA ARGENTARIA, S.A. | NAME CHANGED TO BBVA BANCO | FRANCES S A 10/04/2000 | C |
74 | 845609 | 38070110 | 380701 | GOLD RIV HOTEL & CASINO CORP | GOLD RIVER HOTEL & CASINO CORP | NA | NA | C |
75 | 846615 | 46624B30 | 46624B | JPY HLDGS LTD | JPS INDUSTRIES INC | NA | NA | C |
76 | 846615 | 46624B30 | 46624B | JPY HLDGS LTD | JPS TEXTILE GROUP INC /DE/ | NA | NA | C |
77 | 849502 | 75190710 | 751907 | RAMTRON INTL CORP | RAMTRON INTERNATIONAL CORP | NA | NA | C |
78 | 849547 | 91826107 | 918261 | VLPS LTG SVCS INTL INC | BLACK BOX CORP | NA | NA | C |
79 | 849547 | 91826107 | 918261 | VLPS LTG SVCS INTL INC | MB HOLDINGS INC | NA | NA | C |
80 | 850693 | 18490102 | 184901 | CLEARLY CDN BEVERAGE CORP | ALLERGAN INC | NA | NA | C |
81 | 854873 | 00760G10 | 00760G | AEGIS CONSUMER FDG GROUP INC | RALLYS HAMBURGERS INC | NA | NA | C |
82 | 858339 | 74342A10 | 74342A | PROMUS COS INC | CAESARS ENTERTAINMENT CORP | NAME CHANGED TO HARRAHS ENTMT | INC 06/30/1995 SEE 413619 | C |
83 | 858339 | 74342A10 | 74342A | PROMUS COS INC | HARRAHS ENTERTAINMENT INC | NAME CHANGED TO HARRAHS ENTMT | INC 06/30/1995 SEE 413619 | C |
84 | 858339 | 74342A10 | 74342A | PROMUS COS INC | PROMUS COMPANIES INC | NAME CHANGED TO HARRAHS ENTMT | INC 06/30/1995 SEE 413619 | C |
85 | 869709 | 26831802 | 268318 | EDG CAP INC | EDG CAPITAL INC | NAME CHANGED TO ISOTOPE | SOLUTIONS GROUP INC 11/14/2001 | C |
86 | 869709 | 26831802 | 268318 | EDG CAP INC | ISOTOPE SOLUTIONS GROUP INC | NAME CHANGED TO ISOTOPE | SOLUTIONS GROUP INC 11/14/2001 | C |
87 | 872248 | 68393020 | 683930 | OPTIMA PETE CORP | OPTIMA PETROLEUM CORP | REORGANIZED AS PETROQUEST | ENERGY INC 09/04/1998 | C |
88 | 872248 | 68393020 | 683930 | OPTIMA PETE CORP | PETROQUEST ENERGY INC | REORGANIZED AS PETROQUEST | ENERGY INC 09/04/1998 | C |
89 | 874737 | 59512710 | 595127 | MICRONICS COMPUTERS INC | MICRONICS COMPUTERS INC /CA | NA | NA | C |
90 | 875459 | 92656M10 | 92656M | VIDEO LOTTERY TECHNOLOGIES INC | POWERHOUSE TECHNOLOGIES INC /DE | NAME CHANGED TO POWERHOUSE | TECHNOLOGIES INC | C |
91 | 875459 | 92656M10 | 92656M | VIDEO LOTTERY TECHNOLOGIES INC | VIDEO LOTTERY TECHNOLOGIES INC/DE | NAME CHANGED TO POWERHOUSE | TECHNOLOGIES INC | C |
92 | 879585 | 04907910 | 049079 | ATLANTIC TELE-NETWORK INC | ATLANTIC TELE NETWORK INC /DE | NA | NA | C |
93 | 879585 | 04907910 | 049079 | ATLANTIC TELE-NETWORK INC | ATN INTERNATIONAL, INC. | NA | NA | C |
94 | 882159 | 23922010 | 239220 | DAW TECHNOLOGIES INC | DAW TECHNOLOGIES INC /UT | NA | NA | C |
95 | 882235 | 48625010 | 486250 | KAUFMANN FD INC | LINCARE HOLDINGS INC | NA | NA | C |
96 | 882291 | 00808Y09 | 00808Y | AETHLON MED INC | AETHLON MEDICAL INC | NA | NA | C |
97 | 882291 | 00808Y09 | 00808Y | AETHLON MED INC | BISHOP EQUITIES INC | NA | NA | C |
98 | 884802 | 91382810 | 913828 | UNIVERSAL SEISMIC ASSOC INC | UNIVERSAL SEISMIC ASSOCIATES INC | NAME CHANGED TO POCKETOP CORP | 11/14/2006 SEE 73035T | C |
99 | 885942 | 84476910 | 844769 | SOUTHWEST BANCSHARES INC DEL | SOUTHWEST BANCSHARES INC /NEW/ | NA | NA | C |
100 | 887353 | 81804310 | 818043 | SEVEN-UP / RC BOTTLING CO | SEVEN UP RC BOTTLING COMPANY OF SOUTHERN CALIFORNIA INC | NA | NA | C |
101 | 888999 | 59114210 | 591142 | METALCLAD CORP | METRA BIOSYSTEMS INC | NAME CHANGED TO XNTRX CORP | 07/03/2002 SEE 293850 | C |
102 | 893813 | 70757610 | 707576 | PENN PAC CORP | PENN OCTANE CORP | NA | NA | C |
103 | 895422 | 74037Q10 | 74037Q | PREFERRED INCOME MGMT FD INC | BOULDER TOTAL RETURN FUND INC | NAME CHANGED TO BOULDER TOTAL | RETURN FD INC 08/27/1999 | C |
104 | 895422 | 74037Q10 | 74037Q | PREFERRED INCOME MGMT FD INC | PREFERRED INCOME MANAGEMENT FUND INC | NAME CHANGED TO BOULDER TOTAL | RETURN FD INC 08/27/1999 | C |
105 | 897448 | 02311107 | 023111 | AMARIN CORP PLC | AMARIN CORP PLC\UK | NA | NA | C |
106 | 897448 | 02311107 | 023111 | AMARIN CORP PLC | AMARIN PHARMACEUTICALS PLC | NA | NA | C |
107 | 897448 | 02311107 | 023111 | AMARIN CORP PLC | ETHICAL HOLDINGS PLC | NA | NA | C |
108 | 904081 | 63577110 | 635771 | NATIONAL ED CORP | STECK VAUGHN PUBLISHING CORP | NA | NA | C |
109 | 904900 | 08372L10 | 08372L | BERG ELECTRS CORP | BERG ELECTRONICS CORP /DE/ | NA | NA | C |
110 | 910468 | 48625010 | 486250 | KAUFMANN FD INC | FLOORING AMERICA INC | NA | NA | C |
111 | 910468 | 48625010 | 486250 | KAUFMANN FD INC | MAXIM GROUP INC / | NA | NA | C |
112 | 915197 | 43153101 | 431531 | HILLCREST RES LTD | HILITE INDUSTRIES INC | NA | NA | C |
113 | 919239 | 38095610 | 380956 | GOLDCORP INC NEW | GOLDCORP INC | NA | NA | C |
114 | 924835 | 88871110 | 888711 | TIVOLI INDS INC | TIVOLI INDUSTRIES INC | NA | NA | C |
115 | 928421 | 57723100 | 577231 | MATZEL & MUMFORD MTG FDG INC | MATTSON TECHNOLOGY INC | NA | NA | C |
116 | 943324 | 20741M03 | 20741M | CONMED HEALTHCARE MGMT INC | CONMED HEALTHCARE MANAGEMENT, INC. | NA | NA | C |
117 | 943324 | 20741M03 | 20741M | CONMED HEALTHCARE MGMT INC | PACE HEALTH MANAGEMENT SYSTEMS INC | NA | NA | C |
118 | 946842 | 48625010 | 486250 | KAUFMANN FD INC | AMBASSADORS INTERNATIONAL INC | NA | NA | C |
119 | 1005757 | 12634109 | 126341 | CS FIRST BOSTON INVT FDS INC | CSG SYSTEMS INTERNATIONAL INC | NAME CHANGED TO BEA INVT FDS | INC 06/13/1995 SEE 05538D | C |
120 | 1006835 | 82617012 | 826170 | SIEBEL SYS INC | SIEBEL SYSTEMS INC | REORGANIZED AS SIEBEL JANNA | ARRANGEMENT INC 11/14/2000 | C |
121 | 1013050 | 74257410 | 742574 | PRINTRAK INTL INC | PRINTRAK INTERNATIONAL INC | NA | NA | C |
122 | 1014383 | 68340210 | 683402 | ONYX SOFTWARE CORP | ONYX SOFTWARE CORP/WA | NA | NA | C |
123 | 1017172 | 53226107 | 532261 | LIGHTSHIP TANKERS IV LLC | AUTHORIZE.NET HOLDINGS, INC. | NA | NA | C |
124 | 1017172 | 53226107 | 532261 | LIGHTSHIP TANKERS IV LLC | LIGHTBRIDGE INC | NA | NA | C |
125 | 1032462 | 90297110 | 902971 | U S RESTAURANT PPTYS INC | FF-TSY HOLDING CO II, LLC | REORGANIZED AS TRUSTSTREET | PPTYS INC 02/24/2005 | C |
126 | 1032462 | 90297110 | 902971 | U S RESTAURANT PPTYS INC | TRUSTREET PROPERTIES INC | REORGANIZED AS TRUSTSTREET | PPTYS INC 02/24/2005 | C |
127 | 1032462 | 90297110 | 902971 | U S RESTAURANT PPTYS INC | U S RESTAURANT PROPERTIES INC | REORGANIZED AS TRUSTSTREET | PPTYS INC 02/24/2005 | C |
128 | 1035991 | 36188107 | 361881 | GMD BONDHOLDER TR | FIRST ROBINSON FINANCIAL CORP | NA | NA | C |
129 | 1039684 | 68267810 | 682678 | ONEOK INC | ONEOK INC /NEW/ | REORGANIZED AS ONEOK INC NEW | 11/28/1997 | C |
130 | 1039684 | 68267810 | 682678 | ONEOK INC | WAI INC | REORGANIZED AS ONEOK INC NEW | 11/28/1997 | C |
131 | 1041858 | 74936101 | 749361 | RCN CORP | RCN CORP /DE/ | NA | NA | C |
132 | 1042618 | 65365610 | 653656 | NICE SYS LTD | NICE SYSTEMS LTD /ADR/ | NA | NA | C |
133 | 1043044 | 50194100 | 501941 | LDF INC | LNR PROPERTY CORP | NAME CHANGED TO FIRST CHICAGO | BANCORP 11/22/2006 SEE 31948J | C |
134 | 1044979 | 45004F10 | 45004F | IRI INTL CORP | IRI INTERNATIONAL CORP | NA | NA | C |
135 | 1050122 | 68197104 | 681971 | OMNI ADVANTAGE INC | 1 800 CONTACTS INC | NAME CHANGED TO GO CALL INC | 03/16/1998 SEE 380163 | C |
136 | 1061763 | 44934106 | 449341 | IC WKS INC DEL | HUNGRY MINDS INC /DE/ | NA | NA | C |
137 | 1061763 | 44934106 | 449341 | IC WKS INC DEL | IDG BOOKS WORLDWIDE INC | NA | NA | C |
138 | 1064133 | 98144Q10 | 98144Q | WORLD COMM ONLINE INC DEL | WORLD COMMERCE ONLINE INC | NA | NA | C |
139 | 1065059 | 9033E108 | 9033E1 | U S BK NATL ASSN MINNEAPOLIS | CENTRUS ENERGY CORP | NA | NA | C |
140 | 1065059 | 9033E108 | 9033E1 | U S BK NATL ASSN MINNEAPOLIS | USEC INC | NA | NA | C |
141 | 1069258 | 33385515 | 333855 | FIRST NATL BK SOUTHN MD | KRATOS DEFENSE & SECURITY SOLUTIONS, INC. | ACQUIRED BY FIRST MD BANCORP | 08/15/1983 | C |
142 | 1069258 | 33385515 | 333855 | FIRST NATL BK SOUTHN MD | WIRELESS FACILITIES INC | ACQUIRED BY FIRST MD BANCORP | 08/15/1983 | C |
143 | 1075046 | 89732918 | 897329 | TROY FINL CORP | TROY FINANCIAL CORP | NA | NA | C |
144 | 1093367 | 02932810 | 029328 | AMERICAN RIVS OIL CO | AMERICAN RIVERS OIL CO /DE/ | REORGANIZED AS AROC INC | 12/08/1999 | C |
145 | 1093367 | 02932810 | 029328 | AMERICAN RIVS OIL CO | AROC INC | REORGANIZED AS AROC INC | 12/08/1999 | C |
146 | 1113227 | 90337P10 | 90337P | US WATS INC | CAPSULE COMMUNICATIONS INC DE | REORGANIZED AS CAPSULE | COMMUNICATIONS INC 04/27/2000 | C |
147 | 1113227 | 90337P10 | 90337P | US WATS INC | CAPSULE COMMUNICATIONS INC | REORGANIZED AS CAPSULE | COMMUNICATIONS INC 04/27/2000 | C |
148 | 1120193 | 63110318 | 631103 | NASDAQ OMX GROUP INC | NASDAQ OMX GROUP, INC. | FORMERLY NASDAQ STK MKT INC TO | 02/27/2008 | C |
149 | 1120193 | 63110318 | 631103 | NASDAQ OMX GROUP INC | NASDAQ STOCK MARKET INC | FORMERLY NASDAQ STK MKT INC TO | 02/27/2008 | C |
150 | 1120193 | 63110318 | 631103 | NASDAQ OMX GROUP INC | NASDAQ, INC. | FORMERLY NASDAQ STK MKT INC TO | 02/27/2008 | C |
151 | 1125259 | 14365830 | 143658 | CARNIVAL CORP | CARNIVAL PLC | FORMERLY CARNIVAL CRUISE LINES | INC TO 04/18/1994 | C |
152 | 1125259 | 14365830 | 143658 | CARNIVAL CORP | P&O PRINCESS CRUISES PLC | FORMERLY CARNIVAL CRUISE LINES | INC TO 04/18/1994 | C |
153 | 1165002 | 96175104 | 961751 | WESTWOOD FD INC | WESTWOOD HOLDINGS GROUP INC | NAME CHANGED TO EAST WEST | GROWTH STK FD INC 12/27/1973 | C |
154 | 1175872 | 89462609 | 894626 | TREASURE IS RTY TR | TREASURE ISLAND ROYALTY TRUST | NA | NA | C |
155 | 1192549 | 38056012 | 380560 | GOLD CITY INDS LTD | GOLD CITY INDUSTRIES LTD | NA | NA | C |
156 | 1288359 | 68882302 | 688823 | OTELCO INC | OTELCO INC. | NA | NA | C |
157 | 1288359 | 68882302 | 688823 | OTELCO INC | RURAL LEC ACQUISITION LLC | NA | NA | C |
158 | 1310094 | 00430L10 | 00430L | ACCENTIA BIOPHARMACEUTICALS | ACCENTIA BIOPHARMACEUTICALS INC | NA | NA | C |
159 | 1314755 | 37250B14 | 37250B | GENTIUM S P A | GENTIUM S.P.A. | NA | NA | C |
160 | 1372414 | 00786106 | 007861 | AEROPLAN INCOME FD | AEROHIVE NETWORKS INC | NA | NA | C |
161 | 1372414 | 00786106 | 007861 | AEROPLAN INCOME FD | AEROHIVE NETWORKS, INC | NA | NA | C |
162 | 1382911 | 68616404 | 686164 | OREXIGEN THERAPEUTICS INC | OREXIGEN THERAPEUTICS, INC. | NA | NA | C |
163 | 1389168 | 87612G10 | 87612G | TARGA RES CORP | TARGA RESOURCES, INC. | NA | NA | C |
164 | 1389168 | 87612G10 | 87612G | TARGA RES CORP | TRI RESOURCES INC. | NA | NA | C |
165 | 1422109 | 09548108 | 095481 | BLUE FISH ENTMT INC | BLUE EARTH, INC. | NAME CHANGED TO PACIFIC GOLD | CORP 09/08/2003 SEE 69433M | C |
166 | 1422109 | 09548108 | 095481 | BLUE FISH ENTMT INC | CHERRY TANKERS INC. | NAME CHANGED TO PACIFIC GOLD | CORP 09/08/2003 SEE 69433M | C |
167 | 1422109 | 09548108 | 095481 | BLUE FISH ENTMT INC | CHERRY TANKERS INC | NAME CHANGED TO PACIFIC GOLD | CORP 09/08/2003 SEE 69433M | C |
168 | 1422109 | 09548108 | 095481 | BLUE FISH ENTMT INC | GENESIS FLUID SOLUTIONS HOLDINGS, INC. | NAME CHANGED TO PACIFIC GOLD | CORP 09/08/2003 SEE 69433M | C |
169 | 1504937 | 92025W10 | 92025W | VALOR GOLD CORP | FELAFEL CORP | NA | NA | C |
170 | 1504937 | 92025W10 | 92025W | VALOR GOLD CORP | VALOR GOLD CORP. | NA | NA | C |
171 | 1504937 | 92025W10 | 92025W | VALOR GOLD CORP | VAPORIN, INC. | NA | NA | C |
@iangow
So I've organized some of the rows into soft matches, and rows where company_name and issuer_name_1 clearly do not match.
> View(match_issuers %>% filter(cik %in% soft_matches_i) %>%
distinct(cik, cusip) %>%
mutate(cusip6 = substr(cusip, 1, 6)) %>%
left_join(issuers) %>% left_join(ciks) %>%
distinct(cik, cusip, cusip6, issuer_name_1, company_name, issuer_adl_1, issuer_adl_2, issuer_type))
cik | cusip | cusip6 | issuer_name_1 | company_name | issuer_adl_1 | issuer_adl_2 | issuer_type | |
---|---|---|---|---|---|---|---|---|
1 | 215403 | 15505106 | 155051 | CENTRAL RR CO N J | CENTRAL RESERVE LIFE CORP | REORGANIZED AS CENTRAL JERSEY | INDS INC 08/15/1979 | C |
2 | 215403 | 15505106 | 155051 | CENTRAL RR CO N J | CERES GROUP INC | REORGANIZED AS CENTRAL JERSEY | INDS INC 08/15/1979 | C |
3 | 757011 | 90329405 | 903294 | U S GULF HLDGS INC | USG CORP | NAME CHANGED TO CANCOM | TECHNOLOGIES INC 04/29/1997 | C |
4 | 804138 | 02711807 | 027118 | AMERICAN LD LEASE INC | AMERICAN LAND LEASE INC | NA | NA | C |
5 | 804138 | 02711807 | 027118 | AMERICAN LD LEASE INC | ASSET INVESTORS CORP | NA | NA | C |
6 | 842180 | 05959110 | 059591 | BANCO FRANCES DEL RIO DE LA | BANCO BILBAO VIZCAYA ARGENTARIA S A | NAME CHANGED TO BBVA BANCO | FRANCES S A 10/04/2000 | C |
7 | 842180 | 05959110 | 059591 | BANCO FRANCES DEL RIO DE LA | BANCO BILBAO VIZCAYA ARGENTARIA, S.A. | NAME CHANGED TO BBVA BANCO | FRANCES S A 10/04/2000 | C |
8 | 893813 | 70757610 | 707576 | PENN PAC CORP | PENN OCTANE CORP | NA | NA | C |
9 | 1113227 | 90337P10 | 90337P | US WATS INC | CAPSULE COMMUNICATIONS INC DE | REORGANIZED AS CAPSULE | COMMUNICATIONS INC 04/27/2000 | C |
10 | 1113227 | 90337P10 | 90337P | US WATS INC | CAPSULE COMMUNICATIONS INC | REORGANIZED AS CAPSULE | COMMUNICATIONS INC 04/27/2000 | C |
11 | 1125259 | 14365830 | 143658 | CARNIVAL CORP | CARNIVAL PLC | FORMERLY CARNIVAL CRUISE LINES | INC TO 04/18/1994 | C |
12 | 1125259 | 14365830 | 143658 | CARNIVAL CORP | P&O PRINCESS CRUISES PLC | FORMERLY CARNIVAL CRUISE LINES | INC TO 04/18/1994 | C |
13 | 1165002 | 96175104 | 961751 | WESTWOOD FD INC | WESTWOOD HOLDINGS GROUP INC | NAME CHANGED TO EAST WEST | GROWTH STK FD INC 12/27/1973 | C |
14 | 1175872 | 89462609 | 894626 | TREASURE IS RTY TR | TREASURE ISLAND ROYALTY TRUST | NA | NA | C |
15 | 1389168 | 87612G10 | 87612G | TARGA RES CORP | TARGA RESOURCES, INC. | NA | NA | C |
16 | 1389168 | 87612G10 | 87612G | TARGA RES CORP | TRI RESOURCES INC. | NA | NA | C |
Of these, I think AMERICAN LAND LEASE INC (CIK 804138), TREASURE ISLAND ROYALTY TRUST (CIK 1175872) are matches given the extensive acronymization in the issuer
names, and CAPSULE COMMUNICATIONS INC DE (CIK 1113227) I would say is a match due to the issuer_adl columns, which seems to be columns which state some of the security's history. Also, looking here I would say BANCO BILBAO VIZCAYA ARGENTARIA S A (CIK 842180) is a match. So that's four additional matches. Below are the cases which, going be names and or history, do not seem to match at all.
> View(match_issuers %>% filter(cik %in% not_hard_matches_i & !(cik %in% soft_matches_i)) %>%
distinct(cik, cusip) %>%
mutate(cusip6 = substr(cusip, 1, 6)) %>%
left_join(issuers) %>% left_join(ciks) %>%
distinct(cik, cusip, cusip6, issuer_name_1, company_name, issuer_adl_1, issuer_adl_2, issuer_type))
cik | cusip | cusip6 | issuer_name_1 | company_name | issuer_adl_1 | issuer_adl_2 | issuer_type | |
---|---|---|---|---|---|---|---|---|
1 | 2491 | 36465410 | 364654 | GAMING & TECHNOLOGY INC | ALLIANCE GAMING CORP | NAME CHANGED TO UNITED GAMING | INC 12/08/1988 SEE 910475 | C |
2 | 2491 | 36465410 | 364654 | GAMING & TECHNOLOGY INC | BALLY TECHNOLOGIES, INC. | NAME CHANGED TO UNITED GAMING | INC 12/08/1988 SEE 910475 | C |
3 | 16496 | 13152106 | 131521 | CALVERT CASH RESVS | CALPROP CORP | NAME CHANGED TO MONEY MGMT | PLUS 09/10/1984 SEE 609918 | C |
4 | 74783 | 68417600 | 684176 | ORANGE-CO INC | ACCEPTANCE INSURANCE COMPANIES INC | NAME CHANGED TO STONERIDGE RES | INC 04/22/1987 SEE 861839 | C |
5 | 99982 | 74100410 | 741004 | PRESIDENTIAL RLTY CORP NEW | TRUST CO OF NEW JERSEY | NA | NA | C |
6 | 99982 | 74100420 | 741004 | PRESIDENTIAL RLTY CORP NEW | TRUST CO OF NEW JERSEY | NA | NA | C |
7 | 105096 | 94356710 | 943567 | WAVEPHORE INC | WAXMAN INDUSTRIES INC | NAME CHANGED TO WAVO CORP | 06/23/1999 SEE 944027 | C |
8 | 724024 | 28882108 | 288821 | ELLIS GABLES APT BLDG CORP | AMERICAN PHYSICIANS SERVICE GROUP INC | NA | NA | C |
9 | 750556 | 22463310 | 224633 | CRAWFORD & CO | SUNTRUST BANKS INC | NA | NA | C |
10 | 750556 | 22463320 | 224633 | CRAWFORD & CO | SUNTRUST BANKS INC | NA | NA | C |
11 | 750556 | 69149730 | 691497 | OXFORD INDS INC | SUNTRUST BANKS INC | NA | NA | C |
12 | 750556 | 78125810 | 781258 | RUDDICK CORP | SUNTRUST BANKS INC | NAME CHANGED TO HARRIS TEETER | SUPERMARKETS INC 04/02/2012 | C |
13 | 750556 | 80653010 | 806530 | SCHERER HEALTHCARE INC | SUNTRUST BANKS INC | ACQUIRED BY STERICYCLE INC | 01/09/2003 | C |
14 | 750556 | 89235610 | 892356 | TRACTOR SUPPLY CO | SUNTRUST BANKS INC | NA | NA | C |
15 | 774487 | 03077110 | 030771 | AMES W R CO | AMERIWOOD INDUSTRIES INTERNATIONAL CORP | ACQUIRED BY RUCKER CO | NA | C |
16 | 781902 | 94091200 | 940912 | WATERFORD CAP MGMT INC | INTERNATIONAL BALER CORP | REORGANIZED AS CPI PLASTICS | GOUP LTD 09/21/1998 SEE 12615N | C |
17 | 781902 | 94091200 | 940912 | WATERFORD CAP MGMT INC | WASTE TECHNOLOGY CORP | REORGANIZED AS CPI PLASTICS | GOUP LTD 09/21/1998 SEE 12615N | C |
18 | 789879 | 29356300 | 293563 | ENS BIO LOGICALS INC | ENSURGE INC | NAME CHANGED TO ENSCOR INC | 06/03/1987 SEE 29356L | C |
19 | 789879 | 29356300 | 293563 | ENS BIO LOGICALS INC | ISHOPPER COM INC | NAME CHANGED TO ENSCOR INC | 06/03/1987 SEE 29356L | C |
20 | 789879 | 29356300 | 293563 | ENS BIO LOGICALS INC | SUNWALKER DEVELOPMENT INC | NAME CHANGED TO ENSCOR INC | 06/03/1987 SEE 29356L | C |
21 | 791050 | 02882810 | 028828 | AMERICAN PASSAGE MKTG CORP | PHOTOWORKS INC /WA | NAME CHANGED TO SEATTLE | FILMWORKS INC 02/24/1989 | C |
22 | 791050 | 02882810 | 028828 | AMERICAN PASSAGE MKTG CORP | SEATTLE FILMWORKS INC | NAME CHANGED TO SEATTLE | FILMWORKS INC 02/24/1989 | C |
23 | 811641 | 45252306 | 452523 | IMMERSION HUMAN INTERFACE CORP | IMMUCELL CORP /DE/ | NA | NA | C |
24 | 817135 | 08606103 | 086061 | BERTUCCIS CORP | ALLIANCE HEALTHCARE SERVICES, INC | NA | NA | C |
25 | 817135 | 08606103 | 086061 | BERTUCCIS CORP | ALLIANCE IMAGING INC /DE/ | NA | NA | C |
26 | 831532 | 45070710 | 450707 | IVI PUBG INC | ATRIX INTERNATIONAL INC | REORGANIZED AS ONHEALTH | NETWORK CO 06/16/1998 | C |
27 | 833203 | 24231700 | 242317 | DEALS ARE GOOD INC | LINKON CORP INC | NAME CHANGED TO LINKON CORP | 10/06/1990 SEE 535902 | C |
28 | 833203 | 24231700 | 242317 | DEALS ARE GOOD INC | LINKON CORP | NAME CHANGED TO LINKON CORP | 10/06/1990 SEE 535902 | C |
29 | 833203 | 24231700 | 242317 | DEALS ARE GOOD INC | PACKETPORT COM | NAME CHANGED TO LINKON CORP | 10/06/1990 SEE 535902 | C |
30 | 833203 | 24231700 | 242317 | DEALS ARE GOOD INC | POCKETPORT COM | NAME CHANGED TO LINKON CORP | 10/06/1990 SEE 535902 | C |
31 | 833203 | 24231700 | 242317 | DEALS ARE GOOD INC | WYNDSTORM CORP | NAME CHANGED TO LINKON CORP | 10/06/1990 SEE 535902 | C |
32 | 846615 | 46624B30 | 46624B | JPY HLDGS LTD | JPS INDUSTRIES INC | NA | NA | C |
33 | 846615 | 46624B30 | 46624B | JPY HLDGS LTD | JPS TEXTILE GROUP INC /DE/ | NA | NA | C |
34 | 849547 | 91826107 | 918261 | VLPS LTG SVCS INTL INC | BLACK BOX CORP | NA | NA | C |
35 | 849547 | 91826107 | 918261 | VLPS LTG SVCS INTL INC | MB HOLDINGS INC | NA | NA | C |
36 | 850693 | 18490102 | 184901 | CLEARLY CDN BEVERAGE CORP | ALLERGAN INC | NA | NA | C |
37 | 854873 | 00760G10 | 00760G | AEGIS CONSUMER FDG GROUP INC | RALLYS HAMBURGERS INC | NA | NA | C |
38 | 882235 | 48625010 | 486250 | KAUFMANN FD INC | LINCARE HOLDINGS INC | NA | NA | C |
39 | 888999 | 59114210 | 591142 | METALCLAD CORP | METRA BIOSYSTEMS INC | NAME CHANGED TO XNTRX CORP | 07/03/2002 SEE 293850 | C |
40 | 904081 | 63577110 | 635771 | NATIONAL ED CORP | STECK VAUGHN PUBLISHING CORP | NA | NA | C |
41 | 910468 | 48625010 | 486250 | KAUFMANN FD INC | FLOORING AMERICA INC | NA | NA | C |
42 | 910468 | 48625010 | 486250 | KAUFMANN FD INC | MAXIM GROUP INC / | NA | NA | C |
43 | 915197 | 43153101 | 431531 | HILLCREST RES LTD | HILITE INDUSTRIES INC | NA | NA | C |
44 | 928421 | 57723100 | 577231 | MATZEL & MUMFORD MTG FDG INC | MATTSON TECHNOLOGY INC | NA | NA | C |
45 | 1005757 | 12634109 | 126341 | CS FIRST BOSTON INVT FDS INC | CSG SYSTEMS INTERNATIONAL INC | NAME CHANGED TO BEA INVT FDS | INC 06/13/1995 SEE 05538D | C |
46 | 1017172 | 53226107 | 532261 | LIGHTSHIP TANKERS IV LLC | AUTHORIZE.NET HOLDINGS, INC. | NA | NA | C |
47 | 1017172 | 53226107 | 532261 | LIGHTSHIP TANKERS IV LLC | LIGHTBRIDGE INC | NA | NA | C |
48 | 1035991 | 36188107 | 361881 | GMD BONDHOLDER TR | FIRST ROBINSON FINANCIAL CORP | NA | NA | C |
49 | 1043044 | 50194100 | 501941 | LDF INC | LNR PROPERTY CORP | NAME CHANGED TO FIRST CHICAGO | BANCORP 11/22/2006 SEE 31948J | C |
50 | 1050122 | 68197104 | 681971 | OMNI ADVANTAGE INC | 1 800 CONTACTS INC | NAME CHANGED TO GO CALL INC | 03/16/1998 SEE 380163 | C |
51 | 1061763 | 44934106 | 449341 | IC WKS INC DEL | HUNGRY MINDS INC /DE/ | NA | NA | C |
52 | 1061763 | 44934106 | 449341 | IC WKS INC DEL | IDG BOOKS WORLDWIDE INC | NA | NA | C |
53 | 1065059 | 9033E108 | 9033E1 | U S BK NATL ASSN MINNEAPOLIS | CENTRUS ENERGY CORP | NA | NA | C |
54 | 1065059 | 9033E108 | 9033E1 | U S BK NATL ASSN MINNEAPOLIS | USEC INC | NA | NA | C |
55 | 1069258 | 33385515 | 333855 | FIRST NATL BK SOUTHN MD | KRATOS DEFENSE & SECURITY SOLUTIONS, INC. | ACQUIRED BY FIRST MD BANCORP | 08/15/1983 | C |
56 | 1069258 | 33385515 | 333855 | FIRST NATL BK SOUTHN MD | WIRELESS FACILITIES INC | ACQUIRED BY FIRST MD BANCORP | 08/15/1983 | C |
57 | 1372414 | 00786106 | 007861 | AEROPLAN INCOME FD | AEROHIVE NETWORKS INC | NA | NA | C |
58 | 1372414 | 00786106 | 007861 | AEROPLAN INCOME FD | AEROHIVE NETWORKS, INC | NA | NA | C |
59 | 1422109 | 09548108 | 095481 | BLUE FISH ENTMT INC | BLUE EARTH, INC. | NAME CHANGED TO PACIFIC GOLD | CORP 09/08/2003 SEE 69433M | C |
60 | 1422109 | 09548108 | 095481 | BLUE FISH ENTMT INC | CHERRY TANKERS INC. | NAME CHANGED TO PACIFIC GOLD | CORP 09/08/2003 SEE 69433M | C |
61 | 1422109 | 09548108 | 095481 | BLUE FISH ENTMT INC | CHERRY TANKERS INC | NAME CHANGED TO PACIFIC GOLD | CORP 09/08/2003 SEE 69433M | C |
62 | 1422109 | 09548108 | 095481 | BLUE FISH ENTMT INC | GENESIS FLUID SOLUTIONS HOLDINGS, INC. | NAME CHANGED TO PACIFIC GOLD | CORP 09/08/2003 SEE 69433M | C |
And these are the remaining rows, which I deemed to be matches on first sight.
> View(match_issuers %>% filter(!cik %in% not_hard_matches_i) %>%
distinct(cik, cusip) %>%
mutate(cusip6 = substr(cusip, 1, 6)) %>%
left_join(issuers) %>% left_join(ciks) %>%
distinct(cik, cusip, cusip6, issuer_name_1, company_name, issuer_adl_1, issuer_adl_2, issuer_type))
cik | cusip | cusip6 | issuer_name_1 | company_name | issuer_adl_1 | issuer_adl_2 | issuer_type | |
---|---|---|---|---|---|---|---|---|
1 | 8038 | 04635208 | 046352 | ASTRALIS LTD | ASTREX INC | FORMERLY ASTRALIS | PHARMACEUTICALS LTD 12/10/2001 | C |
2 | 26537 | 23371210 | 233712 | DAEDALUS ENTERPRISES INC | ARGON ST, INC. | NAME CHANGED TO SENSYS | TECHNOLOGIES INC 06/09/1998 | C |
3 | 26537 | 23371210 | 233712 | DAEDALUS ENTERPRISES INC | DAEDALUS ENTERPRISES INC | NAME CHANGED TO SENSYS | TECHNOLOGIES INC 06/09/1998 | C |
4 | 26537 | 23371210 | 233712 | DAEDALUS ENTERPRISES INC | SENSYS TECHNOLOGIES INC | NAME CHANGED TO SENSYS | TECHNOLOGIES INC 06/09/1998 | C |
5 | 26537 | 23371210 | 233712 | DAEDALUS ENTERPRISES INC | SENSYTECH INC | NAME CHANGED TO SENSYS | TECHNOLOGIES INC 06/09/1998 | C |
6 | 33073 | 29403700 | 294037 | ENVIRODYNE INDS INC | ENVIRODYNE INDUSTRIES INC | FORMERLY ENVIRODYNE INC TO | 03/03/1978 AND NAME CHANGED TO | C |
7 | 33073 | 29403700 | 294037 | ENVIRODYNE INDS INC | VISKASE COMPANIES INC | FORMERLY ENVIRODYNE INC TO | 03/03/1978 AND NAME CHANGED TO | C |
8 | 38264 | 34986210 | 349862 | FORWARD INDS INC N Y | FORWARD INDUSTRIES INC | NA | NA | C |
9 | 38264 | 34986210 | 349862 | FORWARD INDS INC N Y | FORWARD INDUSTRIES, INC. | NA | NA | C |
10 | 45791 | 70355510 | 703555 | PAUL HARRIS STORES INC | HARRIS PAUL STORES INC | NA | NA | C |
11 | 45791 | 70355510 | 703555 | PAUL HARRIS STORES INC | PAUL HARRIS STORES INC | NA | NA | C |
12 | 57139 | 5024709C | 502470 | LA BARGE INC | LABARGE INC | NA | NA | C |
13 | 59963 | 25474510 | 254745 | DISTINCTIVE DEVICES INC | DISTINCTIVE DEVICES INC | NA | NA | C |
14 | 65270 | 59152010 | 591520 | METHODE ELECTRS INC | METHODE ELECTRONICS INC | NA | NA | C |
15 | 69067 | 62632010 | 626320 | MUNSINGWEAR INC | MUNSINGWEAR INC | NAME CHANGED TO PREMIUMWEAR | INC 09/06/1996 SEE 740909 | C |
16 | 69067 | 62632010 | 626320 | MUNSINGWEAR INC | PREMIUMWEAR INC | NAME CHANGED TO PREMIUMWEAR | INC 09/06/1996 SEE 740909 | C |
17 | 70415 | 63713010 | 637130 | NATIONAL PATENT DEV CORP | GP STRATEGIES CORP | NAME CHANGED TO GP STRATEGIES | CORP 03/09/1998 SEE 36225V | C |
18 | 70415 | 63713010 | 637130 | NATIONAL PATENT DEV CORP | NATIONAL PATENT DEVELOPMENT CORP | NAME CHANGED TO GP STRATEGIES | CORP 03/09/1998 SEE 36225V | C |
19 | 72843 | 66526286 | 665262 | NORTHERN IND PUB SVC CO | NORTHERN INDIANA PUBLIC SERVICE CO | NA | NA | C |
20 | 74154 | 68267810 | 682678 | ONEOK INC | ONEOK INC | REORGANIZED AS ONEOK INC NEW | 11/28/1997 | C |
21 | 89140 | 81773232 | 817732 | SERVOTRONICS INC | SERVOTRONICS INC /DE/ | NA | NA | C |
22 | 97483 | 71891320 | 718913 | PHOENIX RESOURCE COS INC | PHOENIX RESOURCE COMPANIES INC | NA | NA | C |
23 | 100441 | 90212010 | 902120 | TYCO INTL LTD | TYCO INTERNATIONAL LTD | FORMERLY TYCO LABS INC TO | 11/10/1993 & REORGANIZED AS | C |
24 | 203248 | 84402810 | 844028 | SOUTHERN UN CO | SOUTHERN UNION CO | NA | NA | C |
25 | 216039 | 40009510 | 400095 | GRUBB & ELLIS CO | GRUBB & ELLIS CO | NA | NA | C |
26 | 319156 | 77742900 | 777429 | ROSELAND OIL & GAS INC | CUBIC ENERGY INC | REORGANIZED AS CUBIC ENERGY | INC 10/04/1999 SEE 229675 | C |
27 | 319156 | 77742900 | 777429 | ROSELAND OIL & GAS INC | CUBIC ENERGY LLC | REORGANIZED AS CUBIC ENERGY | INC 10/04/1999 SEE 229675 | C |
28 | 319156 | 77742900 | 777429 | ROSELAND OIL & GAS INC | HILLTOP ENERGY, LLC. | REORGANIZED AS CUBIC ENERGY | INC 10/04/1999 SEE 229675 | C |
29 | 319156 | 77742900 | 777429 | ROSELAND OIL & GAS INC | ROSELAND OIL & GAS INC | REORGANIZED AS CUBIC ENERGY | INC 10/04/1999 SEE 229675 | C |
30 | 704874 | 60520010 | 605200 | MISSION WEST PPTYS | MISSION WEST PROPERTIES/NEW/ | REORGANIZED AS MISSION WEST | PPTYS INC MD 12/30/1998 | C |
31 | 715355 | 40273410 | 402734 | GULFSTREAM AEROSPACE CORP NEW | GULFSTREAM AEROSPACE CORP | NA | NA | C |
32 | 719264 | 31946210 | 319462 | FIRST CTZNS BANCSHARES INC | FIRST CITIZENS BANCSHARES INC /TN/ | NA | NA | C |
33 | 720851 | 64107410 | 641074 | NESTOR INC | NESTOR INC | NA | NA | C |
34 | 721237 | 65488410 | 654884 | NOBEL ED DYNAMICS INC | NOBEL EDUCATION DYNAMICS INC | NAME CHANGED TO NOBEL LEARNING | CMNTYS INC 12/07/1998 | C |
35 | 721237 | 65488410 | 654884 | NOBEL ED DYNAMICS INC | NOBEL LEARNING COMMUNITIES INC | NAME CHANGED TO NOBEL LEARNING | CMNTYS INC 12/07/1998 | C |
36 | 723616 | 28511410 | 285114 | ELECTRO KINETIC SYS INC | CRYSTAL PROPERTIES HOLDINGS, INC. | NAME CHANGED TO STERLING MEDIA | CAP GROUP INC 07/07/2000 | C |
37 | 723616 | 28511410 | 285114 | ELECTRO KINETIC SYS INC | ELECTRO KINETIC SYSTEMS INC | NAME CHANGED TO STERLING MEDIA | CAP GROUP INC 07/07/2000 | C |
38 | 723616 | 28511410 | 285114 | ELECTRO KINETIC SYS INC | STERLING CAPITAL INVESTMENT GROUP INC | NAME CHANGED TO STERLING MEDIA | CAP GROUP INC 07/07/2000 | C |
39 | 723616 | 28511410 | 285114 | ELECTRO KINETIC SYS INC | STERLING MEDIA CAPITAL GROUP INC | NAME CHANGED TO STERLING MEDIA | CAP GROUP INC 07/07/2000 | C |
40 | 729213 | 74791410 | 747914 | Q MED INC | Q MED INC | NA | NA | C |
41 | 729213 | 74791410 | 747914 | Q MED INC | QMED INC | NA | NA | C |
42 | 733289 | 37935510 | 379355 | GLOBAL NAT RES INC | GLOBAL NATURAL RESOURCES INC /NJ/ | NA | NA | C |
43 | 737033 | 74835710 | 748357 | QUESTECH INC | QUESTECH INC | NA | NA | C |
44 | 743872 | 43761400 | 437614 | HOMESTAKE MNG CO | HOMESTAKE MINING CO /DE/ | NAME CHANGED TO BARRICK GOLD | CORP 12/04/2001 SEE 067901 | C |
45 | 748212 | 73179110 | 731791 | POLYPHASE CORP | OVERHILL CORP | NAME CHANGED TO OVERHILL CORP | 03/20/2001 SEE 690211 | C |
46 | 748212 | 73179110 | 731791 | POLYPHASE CORP | POLYPHASE CORP | NAME CHANGED TO OVERHILL CORP | 03/20/2001 SEE 690211 | C |
47 | 748212 | 73179110 | 731791 | POLYPHASE CORP | TREECON RESOURCES INC | NAME CHANGED TO OVERHILL CORP | 03/20/2001 SEE 690211 | C |
48 | 761676 | 15091710 | 150917 | CELLULAR COMMUNICATIONS INC | CELLULAR COMMUNICATIONS INC | MERGED INTO AIRTOUCH | COMMUNICATIONS INC 08/16/1996 | C |
49 | 771498 | 07159310 | 071593 | BATTLE MTN GOLD CO | BATTLE MOUNTAIN GOLD CO | NA | NA | C |
50 | 777844 | 20992910 | 209929 | COMPUSONICS VIDEO CORP | COMPUSONICS VIDEO CORP | NA | NA | C |
51 | 787849 | 98502114 | 985021 | YARDVILLE NATL BANCORP | YARDVILLE NATIONAL BANCORP | NA | NA | C |
52 | 791164 | 42219B10 | 42219B | HEALTH MGMT INC DEL | HEALTH MANAGEMENT INC/DE | NA | NA | C |
53 | 791164 | 42219B10 | 42219B | HEALTH MGMT INC DEL | HOMECARE MANAGEMENT INC | NA | NA | C |
54 | 792161 | 57751400 | 577514 | MAUNA LOA MACADAMIA PARTNERS | MAUNA LOA MACADAMIA PARTNERS LP | NAME CHANGED TO ML MACADAMIA | ORCHARDS L P 09/16/1998 | C |
55 | 792161 | 57751400 | 577514 | MAUNA LOA MACADAMIA PARTNERS | ML MACADAMIA ORCHARDS L P | NAME CHANGED TO ML MACADAMIA | ORCHARDS L P 09/16/1998 | C |
56 | 792161 | 57751400 | 577514 | MAUNA LOA MACADAMIA PARTNERS | ROYAL HAWAIIAN ORCHARDS, L.P. | NAME CHANGED TO ML MACADAMIA | ORCHARDS L P 09/16/1998 | C |
57 | 804154 | 53680840 | 536808 | LITHIUM TECHNOLOGY CORP | LITHIUM TECHNOLOGY CORP | NA | NA | C |
58 | 811419 | 02355926 | 023559 | AMERALIA INC | AMERALIA INC | NAME CHANGED TO NATURAL RES | USA CORP 09/14/2010 | C |
59 | 811419 | 02355926 | 023559 | AMERALIA INC | NATURAL RESOURCES USA CORP | NAME CHANGED TO NATURAL RES | USA CORP 09/14/2010 | C |
60 | 812482 | 80190410 | 801904 | SANTA FE GAMING CORP | ARCHON CORP | NAME CHANGED TO ARCHON CORP | 05/11/2001 SEE 03957P | C |
61 | 812482 | 80190410 | 801904 | SANTA FE GAMING CORP | SAHARA GAMING CORP | NAME CHANGED TO ARCHON CORP | 05/11/2001 SEE 03957P | C |
62 | 812482 | 80190410 | 801904 | SANTA FE GAMING CORP | SANTA FE GAMING CORP | NAME CHANGED TO ARCHON CORP | 05/11/2001 SEE 03957P | C |
63 | 812906 | 40990017 | 409900 | HANCOCK FABRICS INC | HANCOCK FABRICS INC | NA | NA | C |
64 | 814577 | 02312010 | 023120 | AMAX GOLD INC | AMAX GOLD INC | NAME CHANGED TO KINAM GOLD | CORP 09/18/1998 SEE 494482 | C |
65 | 814577 | 02312010 | 023120 | AMAX GOLD INC | KINAM GOLD INC | NAME CHANGED TO KINAM GOLD | CORP 09/18/1998 SEE 494482 | C |
66 | 817644 | 86024C20 | 86024C | STEVENS INTERNATIONAL INC | STEVENS GRAPHICS CORP | FORMERLY STEVENS GRAPHICS CORP | 05/23/1995 | C |
67 | 817644 | 86024C20 | 86024C | STEVENS INTERNATIONAL INC | STEVENS INTERNATIONAL INC | FORMERLY STEVENS GRAPHICS CORP | 05/23/1995 | C |
68 | 829499 | 46072310 | 460723 | INTERSTATE BAKERIES CORP NEW | INTERSTATE BAKERIES CORP/DE/ | MERGED INTO INTERSTATE | BRANDS CORP 06/02/1990 | C |
69 | 840260 | 53803230 | 538032 | LIVE ENTMT INC | LIVE ENTERTAINMENT INC | NAME CHANGED TO CANCER | TREATMENT CTRS INC 07/30/2002 | C |
70 | 841528 | 75952U10 | 75952U | RELIASTAR FINL CORP | NWNL COMPANIES INC | MERGED INTO ING GROEP N.V. | 09/01/2000 | C |
71 | 841528 | 75952U10 | 75952U | RELIASTAR FINL CORP | RELIASTAR FINANCIAL CORP | MERGED INTO ING GROEP N.V. | 09/01/2000 | C |
72 | 845609 | 38070110 | 380701 | GOLD RIV HOTEL & CASINO CORP | GOLD RIVER HOTEL & CASINO CORP | NA | NA | C |
73 | 846609 | 68230310 | 682303 | ONBANCORP INC | ONBANCORP INC | NA | NA | C |
74 | 849502 | 75190710 | 751907 | RAMTRON INTL CORP | RAMTRON INTERNATIONAL CORP | NA | NA | C |
75 | 858339 | 74342A10 | 74342A | PROMUS COS INC | CAESARS ENTERTAINMENT CORP | NAME CHANGED TO HARRAHS ENTMT | INC 06/30/1995 SEE 413619 | C |
76 | 858339 | 74342A10 | 74342A | PROMUS COS INC | HARRAHS ENTERTAINMENT INC | NAME CHANGED TO HARRAHS ENTMT | INC 06/30/1995 SEE 413619 | C |
77 | 858339 | 74342A10 | 74342A | PROMUS COS INC | PROMUS COMPANIES INC | NAME CHANGED TO HARRAHS ENTMT | INC 06/30/1995 SEE 413619 | C |
78 | 862025 | 90337P10 | 90337P | US WATS INC | US WATS INC | REORGANIZED AS CAPSULE | COMMUNICATIONS INC 04/27/2000 | C |
79 | 869709 | 26831802 | 268318 | EDG CAP INC | EDG CAPITAL INC | NAME CHANGED TO ISOTOPE | SOLUTIONS GROUP INC 11/14/2001 | C |
80 | 869709 | 26831802 | 268318 | EDG CAP INC | ISOTOPE SOLUTIONS GROUP INC | NAME CHANGED TO ISOTOPE | SOLUTIONS GROUP INC 11/14/2001 | C |
81 | 872248 | 68393020 | 683930 | OPTIMA PETE CORP | OPTIMA PETROLEUM CORP | REORGANIZED AS PETROQUEST | ENERGY INC 09/04/1998 | C |
82 | 872248 | 68393020 | 683930 | OPTIMA PETE CORP | PETROQUEST ENERGY INC | REORGANIZED AS PETROQUEST | ENERGY INC 09/04/1998 | C |
83 | 874737 | 59512710 | 595127 | MICRONICS COMPUTERS INC | MICRONICS COMPUTERS INC /CA | NA | NA | C |
84 | 875459 | 92656M10 | 92656M | VIDEO LOTTERY TECHNOLOGIES INC | POWERHOUSE TECHNOLOGIES INC /DE | NAME CHANGED TO POWERHOUSE | TECHNOLOGIES INC | C |
85 | 875459 | 92656M10 | 92656M | VIDEO LOTTERY TECHNOLOGIES INC | VIDEO LOTTERY TECHNOLOGIES INC/DE | NAME CHANGED TO POWERHOUSE | TECHNOLOGIES INC | C |
86 | 875622 | 09093116 | 090931 | BIOSPECIFICS TECHNOLOGIES CORP | BIOSPECIFICS TECHNOLOGIES CORP | NA | NA | C |
87 | 879585 | 04907910 | 049079 | ATLANTIC TELE-NETWORK INC | ATLANTIC TELE NETWORK INC /DE | NA | NA | C |
88 | 879585 | 04907910 | 049079 | ATLANTIC TELE-NETWORK INC | ATN INTERNATIONAL, INC. | NA | NA | C |
89 | 880320 | 89472901 | 894729 | TREMONT ADVISERS INC | TREMONT ADVISERS INC | ACQUIRED BY PRIVATE INVESTORS | 10/01/2001 | C |
90 | 882159 | 23922010 | 239220 | DAW TECHNOLOGIES INC | DAW TECHNOLOGIES INC /UT | NA | NA | C |
91 | 882264 | 03852L10 | 03852L | ARAKIS ENERGY CORP | ARAKIS ENERGY CORP | FORMERLY ARAKIS MNG CORP TO | 09/12/1988 & FORMERLY ARAKIS | C |
92 | 882291 | 00808Y09 | 00808Y | AETHLON MED INC | AETHLON MEDICAL INC | NA | NA | C |
93 | 882291 | 00808Y09 | 00808Y | AETHLON MED INC | BISHOP EQUITIES INC | NA | NA | C |
94 | 884802 | 91382810 | 913828 | UNIVERSAL SEISMIC ASSOC INC | UNIVERSAL SEISMIC ASSOCIATES INC | NAME CHANGED TO POCKETOP CORP | 11/14/2006 SEE 73035T | C |
95 | 885942 | 84476910 | 844769 | SOUTHWEST BANCSHARES INC DEL | SOUTHWEST BANCSHARES INC /NEW/ | NA | NA | C |
96 | 887353 | 81804310 | 818043 | SEVEN-UP / RC BOTTLING CO | SEVEN UP RC BOTTLING COMPANY OF SOUTHERN CALIFORNIA INC | NA | NA | C |
97 | 888470 | 890910AB | 890910 | TOPS APPLIANCE CITY INC | TOPS APPLIANCE CITY INC | NA | NA | C |
98 | 890568 | 69340T10 | 69340T | PLD TELEKOM INC | PETERSBURG LONG DISTANCE INC | REORGANIZED AS METROMEDIA | INTL GROUP INC 09/30/1999 | C |
99 | 890568 | 69340T10 | 69340T | PLD TELEKOM INC | PLD TELEKOM INC | REORGANIZED AS METROMEDIA | INTL GROUP INC 09/30/1999 | C |
100 | 895422 | 74037Q10 | 74037Q | PREFERRED INCOME MGMT FD INC | BOULDER TOTAL RETURN FUND INC | NAME CHANGED TO BOULDER TOTAL | RETURN FD INC 08/27/1999 | C |
101 | 895422 | 74037Q10 | 74037Q | PREFERRED INCOME MGMT FD INC | PREFERRED INCOME MANAGEMENT FUND INC | NAME CHANGED TO BOULDER TOTAL | RETURN FD INC 08/27/1999 | C |
102 | 897448 | 02311107 | 023111 | AMARIN CORP PLC | AMARIN CORP PLC\UK | NA | NA | C |
103 | 897448 | 02311107 | 023111 | AMARIN CORP PLC | AMARIN PHARMACEUTICALS PLC | NA | NA | C |
104 | 897448 | 02311107 | 023111 | AMARIN CORP PLC | ETHICAL HOLDINGS PLC | NA | NA | C |
105 | 904900 | 08372L10 | 08372L | BERG ELECTRS CORP | BERG ELECTRONICS CORP /DE/ | NA | NA | C |
106 | 907111 | 36870710 | 368707 | GENEMEDICINE INC | GENEMEDICINE INC | NA | NA | C |
107 | 908598 | 00817R03 | 00817R | AETRIUM INC | AETRIUM INC | NA | NA | C |
108 | 908598 | 00817R03 | 00817R | AETRIUM INC | ATRM HOLDINGS, INC. | NA | NA | C |
109 | 919239 | 38095610 | 380956 | GOLDCORP INC NEW | GOLDCORP INC | NA | NA | C |
110 | 924835 | 88871110 | 888711 | TIVOLI INDS INC | TIVOLI INDUSTRIES INC | NA | NA | C |
111 | 928375 | 98159L10 | 98159L | WORLDWIDE PETROMOLY INC | DUTCH GOLD RESOURCES INC | REORGANIZED AS SMALL TOWN | RADIO INC 05/29/2002 | C |
112 | 928375 | 98159L10 | 98159L | WORLDWIDE PETROMOLY INC | OGDEN MCDONALD & CO | REORGANIZED AS SMALL TOWN | RADIO INC 05/29/2002 | C |
113 | 928375 | 98159L10 | 98159L | WORLDWIDE PETROMOLY INC | SMALL TOWN RADIO INC | REORGANIZED AS SMALL TOWN | RADIO INC 05/29/2002 | C |
114 | 928375 | 98159L10 | 98159L | WORLDWIDE PETROMOLY INC | WORLDWIDE PETROMOLY INC | REORGANIZED AS SMALL TOWN | RADIO INC 05/29/2002 | C |
115 | 932352 | 91913E02 | 91913E | VALENTIS INC | MEGABIOS CORP | NAME CHANGED TO URIGEN | PHARMACEUTICALS INC 07/19/2007 | C |
116 | 932352 | 91913E02 | 91913E | VALENTIS INC | URIGEN PHARMACEUTICALS, INC. | NAME CHANGED TO URIGEN | PHARMACEUTICALS INC 07/19/2007 | C |
117 | 932352 | 91913E02 | 91913E | VALENTIS INC | VALENTIS INC | NAME CHANGED TO URIGEN | PHARMACEUTICALS INC 07/19/2007 | C |
118 | 933157 | 78069310 | 780693 | ROYAL SILVER MINES INC | AURORA OIL & GAS CORP | NAME CHANGED TO CADENCE RES | CORP 05/02/2001 SEE 12738N | C |
119 | 933157 | 78069310 | 780693 | ROYAL SILVER MINES INC | CADENCE RESOURCES CORP LP | NAME CHANGED TO CADENCE RES | CORP 05/02/2001 SEE 12738N | C |
120 | 933157 | 78069310 | 780693 | ROYAL SILVER MINES INC | CADENCE RESOURCES CORP | NAME CHANGED TO CADENCE RES | CORP 05/02/2001 SEE 12738N | C |
121 | 933157 | 78069310 | 780693 | ROYAL SILVER MINES INC | ROYAL SILVER MINES INC | NAME CHANGED TO CADENCE RES | CORP 05/02/2001 SEE 12738N | C |
122 | 937252 | 14166N20 | 14166N | CAREADVANTAGE INC | CAREADVANTAGE INC | NA | NA | C |
123 | 943324 | 20741M03 | 20741M | CONMED HEALTHCARE MGMT INC | CONMED HEALTHCARE MANAGEMENT, INC. | NA | NA | C |
124 | 943324 | 20741M03 | 20741M | CONMED HEALTHCARE MGMT INC | PACE HEALTH MANAGEMENT SYSTEMS INC | NA | NA | C |
125 | 946661 | 37989P10 | 37989P | GLYCOGENESYS INC | GLYCOGENESY INC | NA | NA | C |
126 | 946661 | 37989P10 | 37989P | GLYCOGENESYS INC | GLYCOGENESYS INC | NA | NA | C |
127 | 946661 | 37989P10 | 37989P | GLYCOGENESYS INC | IGG INTERNATIONAL INC | NA | NA | C |
128 | 946661 | 37989P10 | 37989P | GLYCOGENESYS INC | SAFESCIENCE INC | NA | NA | C |
129 | 946842 | 48625010 | 486250 | KAUFMANN FD INC | AMBASSADORS INTERNATIONAL INC | NA | NA | C |
130 | 948642 | 87927W10 | 87927W | TELECOM ITALIA SPA | TELECOM ITALIA S P A | REORGANIZED AS TELECOM ITALIA | S P A NEW 04/08/2003 | C |
131 | 948642 | 87927W10 | 87927W | TELECOM ITALIA SPA | TIM S.P.A. | REORGANIZED AS TELECOM ITALIA | S P A NEW 04/08/2003 | C |
132 | 948850 | 92330P10 | 92330P | VENTURI PARTNERS INC | COMSYS IT PARTNERS INC | NAME CHANGED TO COMSYS IT | PARTNERS INC 09/30/2004 | C |
133 | 948850 | 92330P10 | 92330P | VENTURI PARTNERS INC | PERSONNEL GROUP OF AMERICA INC | NAME CHANGED TO COMSYS IT | PARTNERS INC 09/30/2004 | C |
134 | 948850 | 92330P10 | 92330P | VENTURI PARTNERS INC | VENTURI PARTNERS INC | NAME CHANGED TO COMSYS IT | PARTNERS INC 09/30/2004 | C |
135 | 1002658 | 23437N10 | 23437N | DALEEN TECHNOLOGIES INC | DALEEN TECHNOLOGIES INC | NA | NA | C |
136 | 1006835 | 82617012 | 826170 | SIEBEL SYS INC | SIEBEL SYSTEMS INC | REORGANIZED AS SIEBEL JANNA | ARRANGEMENT INC 11/14/2000 | C |
137 | 1013050 | 74257410 | 742574 | PRINTRAK INTL INC | PRINTRAK INTERNATIONAL INC | NA | NA | C |
138 | 1014383 | 68340210 | 683402 | ONYX SOFTWARE CORP | ONYX SOFTWARE CORP/WA | NA | NA | C |
139 | 1032462 | 90297110 | 902971 | U S RESTAURANT PPTYS INC | FF-TSY HOLDING CO II, LLC | REORGANIZED AS TRUSTSTREET | PPTYS INC 02/24/2005 | C |
140 | 1032462 | 90297110 | 902971 | U S RESTAURANT PPTYS INC | TRUSTREET PROPERTIES INC | REORGANIZED AS TRUSTSTREET | PPTYS INC 02/24/2005 | C |
141 | 1032462 | 90297110 | 902971 | U S RESTAURANT PPTYS INC | U S RESTAURANT PROPERTIES INC | REORGANIZED AS TRUSTSTREET | PPTYS INC 02/24/2005 | C |
142 | 1033030 | 29879610 | 298796 | EUROTECH LTD | EUROTECH LTD | NA | NA | C |
143 | 1039684 | 68267810 | 682678 | ONEOK INC | ONEOK INC /NEW/ | REORGANIZED AS ONEOK INC NEW | 11/28/1997 | C |
144 | 1039684 | 68267810 | 682678 | ONEOK INC | WAI INC | REORGANIZED AS ONEOK INC NEW | 11/28/1997 | C |
145 | 1041858 | 74936101 | 749361 | RCN CORP | RCN CORP /DE/ | NA | NA | C |
146 | 1042618 | 65365610 | 653656 | NICE SYS LTD | NICE SYSTEMS LTD /ADR/ | NA | NA | C |
147 | 1044979 | 45004F10 | 45004F | IRI INTL CORP | IRI INTERNATIONAL CORP | NA | NA | C |
148 | 1064133 | 98144Q10 | 98144Q | WORLD COMM ONLINE INC DEL | WORLD COMMERCE ONLINE INC | NA | NA | C |
149 | 1075046 | 89732918 | 897329 | TROY FINL CORP | TROY FINANCIAL CORP | NA | NA | C |
150 | 1093367 | 02932810 | 029328 | AMERICAN RIVS OIL CO | AMERICAN RIVERS OIL CO /DE/ | REORGANIZED AS AROC INC | 12/08/1999 | C |
151 | 1093367 | 02932810 | 029328 | AMERICAN RIVS OIL CO | AROC INC | REORGANIZED AS AROC INC | 12/08/1999 | C |
152 | 1094348 | 29015110 | 290151 | ELOYALTY CORP | ELOYALTY CORP | NAME CHANGED TO MATTERSIGHT | CORP 05/31/2011 SEE 577097 | C |
153 | 1094348 | 29015110 | 290151 | ELOYALTY CORP | MATTERSIGHT CORP | NAME CHANGED TO MATTERSIGHT | CORP 05/31/2011 SEE 577097 | C |
154 | 1105503 | 52886Q10 | 52886Q | LEXENT INC | LEXENT INC | NA | NA | C |
155 | 1120193 | 63110318 | 631103 | NASDAQ OMX GROUP INC | NASDAQ OMX GROUP, INC. | FORMERLY NASDAQ STK MKT INC TO | 02/27/2008 | C |
156 | 1120193 | 63110318 | 631103 | NASDAQ OMX GROUP INC | NASDAQ STOCK MARKET INC | FORMERLY NASDAQ STK MKT INC TO | 02/27/2008 | C |
157 | 1120193 | 63110318 | 631103 | NASDAQ OMX GROUP INC | NASDAQ, INC. | FORMERLY NASDAQ STK MKT INC TO | 02/27/2008 | C |
158 | 1156388 | 08915P01 | 08915P | BIG 5 SPORTING GOODS CORP | BIG 5 HOLDINGS CORP | NA | NA | C |
159 | 1156388 | 08915P01 | 08915P | BIG 5 SPORTING GOODS CORP | BIG 5 SPORTING GOODS CORP | NA | NA | C |
160 | 1177326 | 55377820 | 553778 | MT ULTIMATE HEALTHCARE CORP | JAVA JUICE NET | NAME CHANGED TO MONARCH | STAFFING INC 03/31/2006 | C |
161 | 1177326 | 55377820 | 553778 | MT ULTIMATE HEALTHCARE CORP | MONARCH STAFFING, INC. | NAME CHANGED TO MONARCH | STAFFING INC 03/31/2006 | C |
162 | 1177326 | 55377820 | 553778 | MT ULTIMATE HEALTHCARE CORP | MT ULTIMATE HEALTHCARE CORP | NAME CHANGED TO MONARCH | STAFFING INC 03/31/2006 | C |
163 | 1192549 | 38056012 | 380560 | GOLD CITY INDS LTD | GOLD CITY INDUSTRIES LTD | NA | NA | C |
164 | 1262279 | 32077108 | 320771 | FIRST MARBLEHEAD CORP | FIRST MARBLEHEAD CORP | NA | NA | C |
165 | 1288359 | 68882302 | 688823 | OTELCO INC | OTELCO INC. | NA | NA | C |
166 | 1288359 | 68882302 | 688823 | OTELCO INC | RURAL LEC ACQUISITION LLC | NA | NA | C |
167 | 1310094 | 00430L10 | 00430L | ACCENTIA BIOPHARMACEUTICALS | ACCENTIA BIOPHARMACEUTICALS INC | NA | NA | C |
168 | 1314755 | 37250B14 | 37250B | GENTIUM S P A | GENTIUM S.P.A. | NA | NA | C |
169 | 1382911 | 68616404 | 686164 | OREXIGEN THERAPEUTICS INC | OREXIGEN THERAPEUTICS, INC. | NA | NA | C |
170 | 1504937 | 92025W10 | 92025W | VALOR GOLD CORP | FELAFEL CORP | NA | NA | C |
171 | 1504937 | 92025W10 | 92025W | VALOR GOLD CORP | VALOR GOLD CORP. | NA | NA | C |
172 | 1504937 | 92025W10 | 92025W | VALOR GOLD CORP | VAPORIN, INC. | NA | NA | C |
173 | 1547530 | 02462418 | 024624 | AMERICAN BOARDING CO | AMERICAN BOARDING CO | NA | NA | C |
174 | 1547530 | 02462418 | 024624 | AMERICAN BOARDING CO | MICROLIN BIO, INC | NA | NA | C |
@iangow
> match_issuers %>% filter(cik %in% not_hard_matches_i & !(cik %in% soft_matches_i)) %>%
+ distinct(cik, cusip) %>%
+ mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ left_join(issuers) %>% left_join(ciks) %>%
+ distinct(cik, cusip) %>% count()
Joining, by = "cusip6"
Joining, by = "cik"
# A tibble: 1 x 1
n
<int>
1 41
> match_issuers %>% filter(cik %in% soft_matches_i) %>%
+ distinct(cik, cusip) %>%
+ mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ left_join(issuers) %>% left_join(ciks) %>%
+ distinct(cik, cusip) %>% count()
Joining, by = "cusip6"
Joining, by = "cik"
# A tibble: 1 x 1
n
<int>
1 10
> match_issuers %>% filter(!cik %in% not_hard_matches_i) %>%
+ distinct(cik, cusip) %>%
+ mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ left_join(issuers) %>% left_join(ciks) %>%
+ distinct(cik, cusip) %>% count()
Joining, by = "cusip6"
Joining, by = "cik"
# A tibble: 1 x 1
n
<int>
1 106
> match_stocknames %>% anti_join(match_issuers) %>% distinct(cik, cusip) %>%
+ mutate(ncusip = cusip) %>%
+ left_join(stocknames, by = 'ncusip') %>% left_join(ciks) %>% select(cik, cusip.x, company_name, comnam) %>% rename(cusip = cusip.x)
Joining, by = c("cusip", "cik")
Joining, by = "cik"
# A tibble: 3 x 4
cik cusip company_name comnam
<int> <chr> <chr> <chr>
1 874516 M2051410 BVR TECHNOLOGIES LTD B V R TECHNOLOGIES LTD
2 874516 M2051410 BVR TECHNOLOGIES LTD B V R TECHNOLOGIES LTD
3 1078276 G6116R10 MIH LTD M I H LTD TORTOLA
So we have 10 matches in the "soft match set", 4 of which I determined in a post above that are actually matches, 41 which are not matches by company name or history, and 106 which are matches. Also, out of the two matches in match_stocknames
which are not also in match_issuers
, is seems both are legitimate matches. So out of the set of 335 8 digit cusip matches with a multiplicity more than 10, 159 matches join onto either crsp.stocknames
or cusipm.issuer
. Out of these 159, 112 can be said to be good matches, with another 6 with a question mark.
Then there are 41 that seem to be bad matches, though it is possible that these entries correspond to older cusips and/or company names, which might not be in either of crsp.stocknames
or cusipm.issuer
(this is why I'd like to know how comprehensive and up to date these are). Are there any other databases that these could be joined onto?
So it seems, if the mentioned databases are comprehensive, then 335 - 159 = 176 of the matches have cusips which do not exist in any major database, and are probably harmless (as they won't join). Then 112 of the remaining 159 are good matches, 6 might be good matches, and 41 are bad matches which could cause harm (as the wrong cusip is potentially matched).
@iangow Some of the 8-digit cusips seemed to be "padded"
> cusip8s %>% anti_join(eight_w_valid9) %>% filter(multiplicity >= 10) %>% distinct(cusip, cik) %>% filter(grepl('(SEECOVER|SCHEDULE|PAGE[0-9]OF[0-9])', cusip))
Joining, by = c("cik", "cusip", "multiplicity")
# A tibble: 20 x 2
cusip cik
<chr> <int>
1 SEECOVER 6292
2 PAGE2OF4 6715
3 SEECOVER 42582
4 SEECOVER 67887
5 SEECOVER 310056
6 SEECOVER 351998
7 SEECOVER 731657
8 SEECOVER 864683
9 SEECOVER 906347
10 SEECOVER 922204
11 SCHEDULE 943861
12 SEECOVER 944075
13 SCHEDULE 1016100
14 SEECOVER 1043961
15 SEECOVER 1051514
16 SEECOVER 1053374
17 PAGE2OF2 1061069
18 SEECOVER 1090431
19 SCHEDULE 1157624
20 SEECOVER 1239819
cases_8 <- cusip8s %>% anti_join(eight_w_valid9) %>% filter(multiplicity >= 10) %>% distinct(cusip, cik) %>% filter(!grepl('(SEECOVER|SCHEDULE|PAGE[0-9]OF[0-9])', cusip))
cases_8 <- cases_8 %>% mutate(cusip_padded = paste0('0', cusip))
cases_8 <- cases_8 %>% mutate(cusip6 = substr(cusip_padded, 1, 6), ncusip = substr(cusip_padded, 1, 8))
cases_8 <- cases_8 %>% select(cik, cusip, cusip_padded, ncusip, cusip6)
> cases_8
# A tibble: 315 x 5
cik cusip cusip_padded ncusip cusip6
<int> <chr> <chr> <chr> <chr>
1 2491 36465410 036465410 03646541 036465
2 3545 01630104 001630104 00163010 001630
3 5133 02375204 002375204 00237520 002375
4 6281 32654105 032654105 03265410 032654
5 8038 04635208 004635208 00463520 004635
6 9346 57755100 057755100 05775510 057755
7 10119 68509603 068509603 06850960 068509
8 11544 84423102 084423102 08442310 084423
9 12208 90572207 090572207 09057220 090572
10 16496 13152106 013152106 01315210 013152
# … with 305 more rows
> cases_8 %>% inner_join(stocknames, by = 'ncusip') %>% left_join(ciks, by = 'cik') %>% select(cik, cusip.x, cusip_padded, ncusip, comnam, company_name) %>% rename(cusip = cusip.x)
# A tibble: 88 x 6
cik cusip cusip_padded ncusip comnam company_name
<int> <chr> <chr> <chr> <chr> <chr>
1 6281 32654105 032654105 03265410 ANALOG DEVICES INC ANALOG DEVICES INC
2 6281 32654105 032654105 03265410 ANALOG DEVICES INC ANALOG DEVICES INC
3 6281 32654105 032654105 03265410 ANALOG DEVICES INC ANALOG DEVICES INC
4 9346 57755100 057755100 05775510 BALDWIN & LYONS INC BALDWIN & LYONS INC
5 9346 57755100 057755100 05775510 BALDWIN & LYONS INC PROTECTIVE INSURANCE CORP
6 9346 57755100 057755100 05775510 BALDWIN & LYONS INC BALDWIN & LYONS INC
7 9346 57755100 057755100 05775510 BALDWIN & LYONS INC PROTECTIVE INSURANCE CORP
8 10119 68509603 068509603 06850960 BARRINGER TECHNOLOGIES INC BARRINGER TECHNOLOGIES INC
9 11544 84423102 084423102 08442310 BERKLEY W R CORP BERKLEY W R CORP
10 11544 84423102 084423102 08442310 BERKLEY W R CORP BERKLEY W R CORP
# … with 78 more rows
> cases_8 %>% inner_join(issuers) %>% left_join(ciks, by = 'cik') %>% select(cik, cusip, cusip_padded, cusip6, company_name, issuer_name_1, issuer_adl_1, issuer_adl_2, )
Joining, by = "cusip6"
# A tibble: 128 x 8
cik cusip cusip_padded cusip6 company_name issuer_name_1 issuer_adl_1 issuer_adl_2
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 3545 01630104 001630104 001630 ALICO INC ALR TECHNOLOGIES INC NA NA
2 3545 01630104 001630104 001630 ALICO, INC. ALR TECHNOLOGIES INC NA NA
3 6281 32654105 032654105 032654 ANALOG DEVICES INC ANALOG DEVICES INC NA NA
4 9346 57755100 057755100 057755 BALDWIN & LYONS INC BALDWIN & LYONS INC NA NA
5 9346 57755100 057755100 057755 PROTECTIVE INSURANCE CORP BALDWIN & LYONS INC NA NA
6 10119 68509603 068509603 068509 BARRINGER TECHNOLOGIES INC BARRINGER TECHNOLOGIES INC FORMERLY BARRINGER RES INC TO 02/12/1991
7 11544 84423102 084423102 084423 BERKLEY W R CORP BERKLEY W R CORP NA NA
8 12208 90572207 090572207 090572 BIO RAD LABORATORIES INC BIO RAD LABORATORIES INC FORMERLY BIO RAD LABS TO 04/25/1975
9 12208 90572207 090572207 090572 BIO-RAD LABORATORIES, INC. BIO RAD LABORATORIES INC FORMERLY BIO RAD LABS TO 04/25/1975
10 26537 23371210 023371210 023371 ARGON ST, INC. AMCAL INC NA NA
# … with 118 more rows
> cases_8 %>% inner_join(issuers) %>% left_join(ciks, by = 'cik') %>% distinct(cik, cusip) %>% count()
Joining, by = "cusip6"
# A tibble: 1 x 1
n
<int>
1 78
> cases_8 %>% inner_join(stocknames, by = 'ncusip') %>% left_join(ciks, by = 'cik') %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 38
> cases_8 %>% inner_join(stocknames, by = 'ncusip') %>% left_join(ciks, by = 'cik') %>% distinct(cik, cusip.x) %>% rename(cusip = cusip.x) %>% anti_join(cases_8 %>% inner_join(issuers) %>% left_join(ciks, by = 'cik') %>% distinct(cik, cusip))
Joining, by = "cusip6"
Joining, by = c("cik", "cusip")
# A tibble: 0 x 2
# … with 2 variables: cik <int>, cusip <chr>
There seem to be 78 candidates which could potentially be matches if we left-pad with a zero.
Obviously, it might be a good idea to have a function in R which calculates the check digit reliably, so that we can further test the padded cusips
get_cusip_digit_value <- function(cusip_digit) {
digits <- c('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D',
'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R',
'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '*', '@', '#')
result = which(digits == cusip_digit) - 1
return(result)
}
get_cusip_digit_value <- Vectorize(get_cusip_digit_value)
calculate_cusip_check_digit <- function(cusip) {
digit_str <- paste0(as.character(get_cusip_digit_value(
unlist(str_split(substr(cusip, 1, 8), ''))) * c(1, 2, 1, 2, 1, 2, 1, 2)))
digits_to_sum <- as.integer(unlist(str_split(digit_str, '')))
result <- (10 - sum(digits_to_sum)) %% 10
return(result)
}
calculate_cusip_check_digit <- Vectorize(calculate_cusip_check_digit)
> cusip_cik %>% filter(cusip_length == 9) %>%
distinct(cik, cusip, check_digit) %>%
mutate(test = calculate_cusip_check_digit(cusip))
# A tibble: 49,691 x 4
cik cusip check_digit test
<int> <chr> <int> <dbl>
1 20 082073107 7 7
2 20 320195100 0 0
3 20 482730108 8 8
4 1750 000361105 5 5
5 1750 232109108 8 8
6 1750 36110513G 3 3
7 1761 894120104 4 4
8 1761 894120203 3 3
9 1800 002824100 0 0
10 1853 003050101 1 1
# … with 49,681 more rows
> cusip_cik %>% filter(cusip_length == 9) %>%
distinct(cik, cusip, check_digit) %>%
mutate(test = calculate_cusip_check_digit(cusip)) %>%
filter(check_digit == test) %>% count()
# A tibble: 1 x 1
n
<int>
1 49691
> cases_8 %>% inner_join(issuers) %>%
left_join(ciks, by = 'cik') %>%
distinct(cik, cusip) %>% count()
Joining, by = "cusip6"
# A tibble: 1 x 1
n
<int>
1 78
> cases_8 %>% inner_join(issuers) %>%
left_join(ciks, by = 'cik') %>%
mutate(check_digit = calculate_cusip_check_digit(cusip_padded)) %>%
filter(substr(cusip_padded, 9, 9) == as.character(check_digit)) %>%
distinct(cik, cusip) %>% count()
Joining, by = "cusip6"
# A tibble: 1 x 1
n
<int>
1 43
So there are still 43 potential cases for padded cusips, once check digit contraint is applied.
> cases_8 <- cases_8 %>% mutate(check_digit = calculate_cusip_check_digit(cusip_padded))
> cases_8
# A tibble: 315 x 6
cik cusip cusip_padded ncusip cusip6 check_digit
<int> <chr> <chr> <chr> <chr> <dbl>
1 2491 36465410 036465410 03646541 036465 7
2 3545 01630104 001630104 00163010 001630 2
3 5133 02375204 002375204 00237520 002375 2
4 6281 32654105 032654105 03265410 032654 5
5 8038 04635208 004635208 00463520 004635 7
6 9346 57755100 057755100 05775510 057755 0
7 10119 68509603 068509603 06850960 068509 3
8 11544 84423102 084423102 08442310 084423 2
9 12208 90572207 090572207 09057220 090572 7
10 16496 13152106 013152106 01315210 013152 3
# … with 305 more rows
> cases_8 %>% filter(substr(cusip_padded, 9, 9) == as.character(check_digit))
# A tibble: 69 x 6
cik cusip cusip_padded ncusip cusip6 check_digit
<int> <chr> <chr> <chr> <chr> <dbl>
1 6281 32654105 032654105 03265410 032654 5
2 9346 57755100 057755100 05775510 057755 0
3 10119 68509603 068509603 06850960 068509 3
4 11544 84423102 084423102 08442310 084423 2
5 12208 90572207 090572207 09057220 090572 7
6 21610 75907105 075907105 07590710 075907 5
7 46195 62540109 062540109 06254010 062540 9
8 70415 63713010 063713010 06371301 063713 0
9 72843 66526286 066526286 06652628 066526 6
10 100441 90212010 090212010 09021201 090212 0
# … with 59 more rows
So there are 69 8-digit cusips which are potentially 9-digit cusips when padded with a 0 from the left. And 43 of these map onto either stocknames or issuers (note: I have already established that all potential matches to stocknames also match onto issuers in this set in a previous post).
View(cases_8 %>% inner_join(issuers) %>%
left_join(ciks, by = 'cik') %>%
filter(substr(cusip_padded, 9, 9) == as.character(check_digit)) %>%
distinct(cik, cusip, cusip_padded, cusip6, check_digit, company_name,
issuer_name_1, issuer_name_2, issuer_name_3, issuer_adl_1)
gives
cik | cusip | cusip_padded | cusip6 | check_digit | company_name | issuer_name_1 | issuer_name_2 | issuer_name_3 | issuer_adl_1 | |
---|---|---|---|---|---|---|---|---|---|---|
1 | 6281 | 32654105 | 032654105 | 032654 | 5 | ANALOG DEVICES INC | ANALOG DEVICES INC | NA | NA | NA |
2 | 9346 | 57755100 | 057755100 | 057755 | 0 | BALDWIN & LYONS INC | BALDWIN & LYONS INC | NA | NA | NA |
3 | 9346 | 57755100 | 057755100 | 057755 | 0 | PROTECTIVE INSURANCE CORP | BALDWIN & LYONS INC | NA | NA | NA |
4 | 10119 | 68509603 | 068509603 | 068509 | 3 | BARRINGER TECHNOLOGIES INC | BARRINGER TECHNOLOGIES INC | NA | NA | FORMERLY BARRINGER RES INC TO |
5 | 11544 | 84423102 | 084423102 | 084423 | 2 | BERKLEY W R CORP | BERKLEY W R CORP | NA | NA | NA |
6 | 12208 | 90572207 | 090572207 | 090572 | 7 | BIO RAD LABORATORIES INC | BIO RAD LABORATORIES INC | NA | NA | FORMERLY BIO RAD LABS TO |
7 | 12208 | 90572207 | 090572207 | 090572 | 7 | BIO-RAD LABORATORIES, INC. | BIO RAD LABORATORIES INC | NA | NA | FORMERLY BIO RAD LABS TO |
8 | 46195 | 62540109 | 062540109 | 062540 | 9 | BANCORP HAWAII INC | BANK HAWAII CORP | NA | NA | NA |
9 | 46195 | 62540109 | 062540109 | 062540 | 9 | BANK OF HAWAII CORP | BANK HAWAII CORP | NA | NA | NA |
10 | 46195 | 62540109 | 062540109 | 062540 | 9 | PACIFIC CENTURY FINANCIAL CORP | BANK HAWAII CORP | NA | NA | NA |
11 | 206030 | 54615109 | 054615109 | 054615 | 9 | AXSYS TECHNOLOGIES INC | AXSYS TECHNOLOGIES INC | NA | NA | NA |
12 | 206030 | 54615109 | 054615109 | 054615 | 9 | VERNITRON CORP | AXSYS TECHNOLOGIES INC | NA | NA | NA |
13 | 718976 | 55442107 | 055442107 | 055442 | 7 | BGS SYSTEMS INC | BGS SYS INC | NA | NA | NA |
14 | 724024 | 28882108 | 028882108 | 028882 | 8 | AMERICAN PHYSICIANS SERVICE GROUP INC | AMERICAN PHYSICIANS SVC GROUP | INC | NA | NA |
15 | 750556 | 69149730 | 069149730 | 069149 | 0 | SUNTRUST BANKS INC | BARTELL MEDIA CORP | NA | NA | MERGED INTO DOWNE |
16 | 771497 | 00957100 | 000957100 | 000957 | 0 | ABM INDUSTRIES INC /DE/ | ABM INDS INC | NA | NA | NA |
17 | 771497 | 00957100 | 000957100 | 000957 | 0 | ABM INDUSTRIES INC | ABM INDS INC | NA | NA | NA |
18 | 771497 | 00957100 | 000957100 | 000957 | 0 | AMERICAN BUILDING MAINTENANCE INDUSTRIES INC | ABM INDS INC | NA | NA | NA |
19 | 796486 | 06848105 | 006848105 | 006848 | 5 | ADELPHIA COMMUNICATIONS CORP | ADELPHIA COMMUNICATIONS CORP | NA | NA | NA |
20 | 799233 | 42234104 | 042234104 | 042234 | 4 | HEARTLAND EXPRESS INC | ARMISTICE RES CORP | NA | NA | NAME CHANGED TO KERR MINES INC |
21 | 814580 | 0753T105 | 00753T105 | 00753T | 5 | ADVANCED MARKETING SERVICES INC | ADVANCED MARKETING SVCS INC | NA | NA | NA |
22 | 818813 | 91736108 | 091736108 | 091736 | 8 | BITSTREAM INC. | BITSTREAM INC | NA | NA | NA |
23 | 818813 | 91736108 | 091736108 | 091736 | 8 | BITSTREAM INC | BITSTREAM INC | NA | NA | NA |
24 | 830736 | 90649104 | 090649104 | 090649 | 4 | ALPHA INNOTECH CORP | BIOPOOL INTL INC | NA | NA | NAME CHANGED TO XTRANA INC |
25 | 830736 | 90649104 | 090649104 | 090649 | 4 | BIOPOOL INTERNATIONAL INC | BIOPOOL INTL INC | NA | NA | NAME CHANGED TO XTRANA INC |
26 | 830736 | 90649104 | 090649104 | 090649 | 4 | XTRANA INC | BIOPOOL INTL INC | NA | NA | NAME CHANGED TO XTRANA INC |
27 | 834306 | 90945106 | 090945106 | 090945 | 6 | ALERE SAN DIEGO, INC. | BIOSITE INC | NA | NA | FORMERLY BIOSITE DIAGNOSTICS |
28 | 834306 | 90945106 | 090945106 | 090945 | 6 | BIOSITE DIAGNOSTICS INC | BIOSITE INC | NA | NA | FORMERLY BIOSITE DIAGNOSTICS |
29 | 834306 | 90945106 | 090945106 | 090945 | 6 | BIOSITE INC | BIOSITE INC | NA | NA | FORMERLY BIOSITE DIAGNOSTICS |
30 | 840401 | 04380911 | 004380911 | 004380 | 1 | ACCUHEALTH INC | ACCUHEALTH INC | NA | NA | NA |
31 | 847935 | 19615103 | 019615103 | 019615 | 3 | ALLION HEALTHCARE INC | ALLION HEALTHCARE INC | NA | NA | NA |
32 | 847935 | 19615103 | 019615103 | 019615 | 3 | CARE GROUP INC | ALLION HEALTHCARE INC | NA | NA | NA |
33 | 849547 | 91826107 | 091826107 | 091826 | 7 | BLACK BOX CORP | BLACK BOX CORP DEL | NA | NA | NA |
34 | 849547 | 91826107 | 091826107 | 091826 | 7 | MB HOLDINGS INC | BLACK BOX CORP DEL | NA | NA | NA |
35 | 850693 | 18490102 | 018490102 | 018490 | 2 | ALLERGAN INC | ALLERGAN INC | NA | NA | NA |
36 | 882291 | 00808Y09 | 000808Y09 | 000808 | 9 | AETHLON MEDICAL INC | ACA ABS 2004-1 LTD | NA | NA | NA |
37 | 882291 | 00808Y09 | 000808Y09 | 000808 | 9 | BISHOP EQUITIES INC | ACA ABS 2004-1 LTD | NA | NA | NA |
38 | 888711 | 06698104 | 006698104 | 006698 | 4 | BANYAN SYSTEMS INC | ADDMASTER CORP | NA | NA | NA |
39 | 888711 | 06698104 | 006698104 | 006698 | 4 | EPRESENCE INC | ADDMASTER CORP | NA | NA | NA |
40 | 895385 | 40228108 | 040228108 | 040228 | 8 | ARGOSY GAMING CO | ARGOSY GAMING CO | NA | NA | NA |
41 | 898437 | 35255108 | 035255108 | 035255 | 8 | ANIKA RESEARCH INC | ANIKA THERAPEUTICS INC | NA | NA | FORMERLY ANIKA RESH INC TO |
42 | 898437 | 35255108 | 035255108 | 035255 | 8 | ANIKA THERAPEUTICS INC | ANIKA THERAPEUTICS INC | NA | NA | FORMERLY ANIKA RESH INC TO |
43 | 898437 | 35255108 | 035255108 | 035255 | 8 | ANIKA THERAPEUTICS, INC. | ANIKA THERAPEUTICS INC | NA | NA | FORMERLY ANIKA RESH INC TO |
44 | 900708 | 78137106 | 078137106 | 078137 | 6 | BELL MICROPRODUCTS INC | BELL MICROPRODUCTS INC | NA | NA | NA |
45 | 902600 | 10196103 | 010196103 | 010196 | 3 | AKSYS LTD | AKSYS LTD | NA | NA | NA |
46 | 902791 | 68463108 | 068463108 | 068463 | 8 | BARRETT BUSINESS SERVICES INC | BARRETT BUSINESS SVCS INC | NA | NA | NA |
47 | 935226 | 82047200 | 082047200 | 082047 | 0 | BENIHANA INC | BENIHANA INC | NA | NA | NA |
48 | 940332 | 32515108 | 032515108 | 032515 | 8 | ANADIGICS INC | ANADIGICS INC | NA | NA | NA |
49 | 1002658 | 23437N10 | 023437N10 | 023437 | 0 | DALEEN TECHNOLOGIES INC | AMEE INC | NA | NA | NAME CHANGED TO AQUAPLAN INC |
50 | 1013238 | 38505103 | 038505103 | 038505 | 3 | ARADIGM CORP | ARADIGM CORP | NA | NA | NA |
51 | 1028087 | 25353103 | 025353103 | 025353 | 3 | AMERICAN DENTAL PARTNERS INC | AMERICAN DENTAL PARTNERS INC | NA | NA | NA |
52 | 1032462 | 90297110 | 090297110 | 090297 | 0 | FF-TSY HOLDING CO II, LLC | BILTMORE FDS | NA | NA | NAME CHANGED TO WACHOVIA FDS |
53 | 1032462 | 90297110 | 090297110 | 090297 | 0 | TRUSTREET PROPERTIES INC | BILTMORE FDS | NA | NA | NAME CHANGED TO WACHOVIA FDS |
54 | 1032462 | 90297110 | 090297110 | 090297 | 0 | U S RESTAURANT PROPERTIES INC | BILTMORE FDS | NA | NA | NAME CHANGED TO WACHOVIA FDS |
55 | 1061069 | 0534P109 | 00534P109 | 00534P | 9 | AVALON HOLDINGS CORP | ADAGIO III CLO PLC | NA | NA | NA |
56 | 1085734 | 93679108 | 093679108 | 093679 | 8 | BLOCKBUSTER INC | BLOCKBUSTER INC | NA | NA | NAME CHANGED TO BB LIQUIDATING |
57 | 1089531 | 0504X207 | 00504X207 | 00504X | 7 | ACTIVEWORLDS COM INC | ACTIVEWORLDS CORP | NA | NA | NAME CHANGED TO KINGOLD |
58 | 1089531 | 0504X207 | 00504X207 | 00504X | 7 | ACTIVEWORLDS CORP | ACTIVEWORLDS CORP | NA | NA | NAME CHANGED TO KINGOLD |
59 | 1089531 | 0504X207 | 00504X207 | 00504X | 7 | KINGOLD JEWELRY, INC. | ACTIVEWORLDS CORP | NA | NA | NAME CHANGED TO KINGOLD |
60 | 1116094 | 54923107 | 054923107 | 054923 | 7 | AXCAN PHARMA INC | AXCAN PHARMA INC | NA | NA | NA |
61 | 1117057 | 27297100 | 027297100 | 027297 | 0 | AMERICAN LORAIN CORP | AMERICAN LORAIN CORP | NA | NA | NA |
62 | 1117057 | 27297100 | 027297100 | 027297 | 0 | MILLENNIUM QUEST INC | AMERICAN LORAIN CORP | NA | NA | NA |
63 | 1117057 | 27297100 | 027297100 | 027297 | 0 | PLANET GREEN HOLDINGS CORP. | AMERICAN LORAIN CORP | NA | NA | NA |
64 | 1162315 | 23177108 | 023177108 | 023177 | 8 | AMBASSADORS GROUP INC | AMBASSADORS GROUP INC | NA | NA | NA |
65 | 1324105 | 8883T200 | 08883T200 | 08883T | 0 | BIDZ.COM, INC. | BIDZ COM INC | NA | NA | NA |
66 | 1344413 | 15384100 | 015384100 | 015384 | 0 | ALEXZA PHARMACEUTICALS INC. | ALEXZA PHARMACEUTICALS INC | NA | NA | NA |
67 | 1518749 | 5501E201 | 05501E201 | 05501E | 1 | AZTECA ACQUISITION CORP | AZTECA ACQUISITION CORP | NA | NA | REORGANIZED AS HEMISPHERE |
The matches from these are pretty good. The only clear non-matches are these ones
not_matches <- c(750556, 799233, 882291, 888711, 1002658, 1032462, 1061069)
View(cases_8 %>% filter(cik %in% not_matches) %>%
inner_join(issuers) %>%
left_join(ciks, by = 'cik') %>%
filter(substr(cusip_padded, 9, 9) == as.character(check_digit)) %>%
distinct(cik, cusip, cusip_padded, cusip6, check_digit, company_name,
issuer_name_1, issuer_name_2, issuer_name_3, issuer_adl_1)
cik | cusip | cusip_padded | cusip6 | check_digit | company_name | issuer_name_1 | issuer_name_2 | issuer_name_3 | issuer_adl_1 | |
---|---|---|---|---|---|---|---|---|---|---|
1 | 750556 | 69149730 | 069149730 | 069149 | 0 | SUNTRUST BANKS INC | BARTELL MEDIA CORP | NA | NA | MERGED INTO DOWNE |
2 | 799233 | 42234104 | 042234104 | 042234 | 4 | HEARTLAND EXPRESS INC | ARMISTICE RES CORP | NA | NA | NAME CHANGED TO KERR MINES INC |
3 | 882291 | 00808Y09 | 000808Y09 | 000808 | 9 | AETHLON MEDICAL INC | ACA ABS 2004-1 LTD | NA | NA | NA |
4 | 882291 | 00808Y09 | 000808Y09 | 000808 | 9 | BISHOP EQUITIES INC | ACA ABS 2004-1 LTD | NA | NA | NA |
5 | 888711 | 06698104 | 006698104 | 006698 | 4 | BANYAN SYSTEMS INC | ADDMASTER CORP | NA | NA | NA |
6 | 888711 | 06698104 | 006698104 | 006698 | 4 | EPRESENCE INC | ADDMASTER CORP | NA | NA | NA |
7 | 1002658 | 23437N10 | 023437N10 | 023437 | 0 | DALEEN TECHNOLOGIES INC | AMEE INC | NA | NA | NAME CHANGED TO AQUAPLAN INC |
8 | 1032462 | 90297110 | 090297110 | 090297 | 0 | FF-TSY HOLDING CO II, LLC | BILTMORE FDS | NA | NA | NAME CHANGED TO WACHOVIA FDS |
9 | 1032462 | 90297110 | 090297110 | 090297 | 0 | TRUSTREET PROPERTIES INC | BILTMORE FDS | NA | NA | NAME CHANGED TO WACHOVIA FDS |
10 | 1032462 | 90297110 | 090297110 | 090297 | 0 | U S RESTAURANT PROPERTIES INC | BILTMORE FDS | NA | NA | NAME CHANGED TO WACHOVIA FDS |
11 | 1061069 | 0534P109 | 00534P109 | 00534P | 9 | AVALON HOLDINGS CORP | ADAGIO III CLO PLC | NA | NA | NA |
So it seems we have 43 - 7 = 36 good matches from this analysis with 8-digit cusips padded with zeroes from the left.
> cases_8 %>% filter(!cik %in% not_matches) %>%
inner_join(issuers) %>%
left_join(ciks, by = 'cik') %>%
filter(substr(cusip_padded, 9, 9) == as.character(check_digit)) %>%
distinct(cik, cusip, cusip_padded, cusip6, check_digit, company_name,
issuer_name_1, issuer_name_2, issuer_name_3, issuer_adl_1,
issuer_adl_2, issuer_adl_3, issuer_status) %>%
inner_join(match_issuers %>% inner_join(issuers), by = c('cik', 'cusip')) %>%
select(cik, cusip, cusip_padded, check_digit, company_name,
issuer_name_1.x, issuer_name_1.y) %>%
rename(issuer_name_1_padded = issuer_name_1.x,
issuer_name_1_unpadded = issuer_name_1.y)
Joining, by = "cusip6" Joining, by = "cusip6"
# A tibble: 4 x 7
cik cusip cusip_padded check_digit company_name issuer_name_1_padded issuer_name_1_unpadded
<int> <chr> <chr> <dbl> <chr> <chr> <chr>
1 724024 28882108 028882108 8 AMERICAN PHYSICIANS SERVICE GROUP INC AMERICAN PHYSICIANS SVC GROUP ELLIS GABLES APT BLDG CORP
2 849547 91826107 091826107 7 BLACK BOX CORP BLACK BOX CORP DEL VLPS LTG SVCS INTL INC
3 849547 91826107 091826107 7 MB HOLDINGS INC BLACK BOX CORP DEL VLPS LTG SVCS INTL INC
4 850693 18490102 018490102 2 ALLERGAN INC ALLERGAN INC CLEARLY CDN BEVERAGE CORP
Padding fixes 3 matches from the bad matches in match_issuers
.
There is no overlap with the initial cases that matched stocknames
> cases_8 %>% filter(!cik %in% not_matches) %>%
inner_join(issuers) %>%
left_join(ciks, by = 'cik') %>%
filter(substr(cusip_padded, 9, 9) == as.character(check_digit)) %>%
distinct(cik, cusip, cusip_padded, cusip6, check_digit, company_name,
issuer_name_1, issuer_name_2, issuer_name_3, issuer_adl_1,
issuer_adl_2, issuer_adl_3, issuer_status) %>%
inner_join(match_stocknames)
Joining, by = "cusip6"
Joining, by = c("cik", "cusip")
# A tibble: 0 x 13
# … with 13 variables: cik <int>, cusip <chr>, cusip_padded <chr>, cusip6 <chr>, check_digit <dbl>, company_name <chr>, issuer_name_1 <chr>, issuer_name_2 <chr>,
# issuer_name_3 <chr>, issuer_adl_1 <chr>, issuer_adl_2 <chr>, issuer_adl_3 <chr>, issuer_status <chr>
Here, I'm looking at cases where a record with an 8-digit cusip also has a valid 9-digit one, and subsequently, in how many cases can the 8-digit cusips push the 9-digit cases over the threshold of 10 for the multiplicity (ie. multiplicity(cusip8) + multiplicity(cusip9) >= 10). One can see that there are 465 cases in total, with 206 satisfying the constraint.
> cusip8s %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip.x == substr(cusip.y, 1, 8), substr(cusip.y, 9, 9) == as.character(check_digit.y)) %>% distinct(cik, cusip.x, multiplicity.x, cusip.y, multiplicity.y) %>% filter(multiplicity.y < 10) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 465
> cusip8s %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip.x == substr(cusip.y, 1, 8), substr(cusip.y, 9, 9) == as.character(check_digit.y)) %>% distinct(cik, cusip.x, multiplicity.x, cusip.y, multiplicity.y) %>% filter(multiplicity.y < 10) %>% filter(multiplicity.x + multiplicity.y >= 10) %>% distinct(cik, cusip.x) %>% count()
# A tibble: 1 x 1
n
<int>
1 206
I then looked to see if these cases matched issuers and stocknames. Below, you can see that 201 of the 206 cases matched issuers.
> cusip8s %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip.x == substr(cusip.y, 1, 8), substr(cusip.y, 9, 9) == as.character(check_digit.y)) %>% distinct(cik, cusip.x, multiplicity.x, cusip.y, multiplicity.y) %>% filter(multiplicity.y < 10) %>% filter(multiplicity.x + multiplicity.y >= 10) %>% distinct(cik, cusip.x) %>% rename(cusip = cusip.x) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>% inner_join(issuers) %>% left_join(ciks) %>% distinct(cik, cusip) %>% count()
Joining, by = "cusip6"
Joining, by = "cik"
# A tibble: 1 x 1
n
<int>
1 201
Then calling that matched dataframe m89_issuers
, I looked at what matched stocknames but not issuers. It turns out that there are 4 such cases, and by comparing company_name
and comnam
, you can see that all 4 of them are good matches
m89_issuers <- cusip8s %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip.x == substr(cusip.y, 1, 8), substr(cusip.y, 9, 9) == as.character(check_digit.y)) %>% distinct(cik, cusip.x, multiplicity.x, cusip.y, multiplicity.y) %>% filter(multiplicity.y < 10) %>% filter(multiplicity.x + multiplicity.y >= 10) %>% distinct(cik, cusip.x) %>% rename(cusip = cusip.x) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>% inner_join(issuers) %>% left_join(ciks) %>% distinct(cik, cusip)
> cusip8s %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip.x == substr(cusip.y, 1, 8), substr(cusip.y, 9, 9) == as.character(check_digit.y)) %>% distinct(cik, cusip.x, multiplicity.x, cusip.y, multiplicity.y) %>% filter(multiplicity.y < 10) %>% filter(multiplicity.x + multiplicity.y >= 10) %>% anti_join(m89_issuers)
Joining, by = "cik"
# A tibble: 5 x 5
cik cusip.x multiplicity.x cusip.y multiplicity.y
<int> <chr> <int> <chr> <int>
1 874516 M2051410 18 M20514101 7
2 1078276 G6116R10 24 G6116R101 4
3 1294649 93166110 6 931661102 6
4 1609550 45773010 2 457730109 8
5 1710340 29772L10 2 29772L108 8
> cusip8s %>% inner_join(cusip_cik, by = 'cik') %>% filter(cusip_length.y == 9) %>% filter(cusip.x == substr(cusip.y, 1, 8), substr(cusip.y, 9, 9) == as.character(check_digit.y)) %>% distinct(cik, cusip.x, multiplicity.x, cusip.y, multiplicity.y) %>% filter(multiplicity.y < 10) %>% filter(multiplicity.x + multiplicity.y >= 10) %>% distinct(cik, cusip.x) %>% rename(cusip = cusip.x) %>% mutate(ncusip = cusip) %>% inner_join(stocknames) %>% left_join(ciks) %>% anti_join(m89_issuers) %>% distinct(cik, cusip, company_name, comnam)
Joining, by = c("cusip", "ncusip")
Joining, by = "cik"
Joining, by = c("cik", "cusip")
# A tibble: 4 x 4
cik cusip company_name comnam
<int> <chr> <chr> <chr>
1 874516 M2051410 BVR TECHNOLOGIES LTD B V R TECHNOLOGIES LTD
2 1078276 G6116R10 MIH LTD M I H LTD TORTOLA
3 1609550 45773010 INSPIRE MEDICAL SYSTEMS, INC. INSPIRE MEDICAL SYSTEMS INC
4 1710340 29772L10 ETON PHARMACEUTICALS, INC. ETON PHARMACEUTICALS INC
So 205 of the 206 cases of interest match either issuers or stocknames, with cik = 1294649, cusip = 93166110 the only case not match either. I then looked at the matches in issuers to see how the company_name
compared with issuer_name_1
(or issuer_adl_1
in some cases, as that often contains old/new names). These were the only bad matches I found, corresponding to 7 distinct (cik, cusip) pairs
> m89_issuers %>% mutate(cusip6 = substr(cusip, 1, 6)) %>% inner_join(issuers) %>% left_join(ciks) %>% distinct(cik, cusip, company_name, issuer_name_1, issuer_name_2, issuer_name_3, issuer_adl_1) %>% filter(cik %in% c(882304, 882800, 909109, 915781, 946842, 1011632, 1363589))
Joining, by = "cusip6"
Joining, by = "cik"
# A tibble: 9 x 7
cik cusip company_name issuer_name_1 issuer_name_2 issuer_name_3 issuer_adl_1
<int> <chr> <chr> <chr> <chr> <chr> <chr>
1 882304 48625010 ACCESS HEALTH INC KAUFMANN FD INC NA NA NA
2 882800 46985840 AMERICAN EAGLE GROUP INC JACOR COMMUNICATIONS INC NA NA NA
3 882800 46985840 PACIFIC VENTURES GROUP, INC. JACOR COMMUNICATIONS INC NA NA NA
4 909109 48625010 PHILADELPHIA CONSOLIDATED HOLDING CORP KAUFMANN FD INC NA NA NA
5 915781 48625010 RTW INC /MN/ KAUFMANN FD INC NA NA NA
6 946842 48625010 AMBASSADORS INTERNATIONAL INC KAUFMANN FD INC NA NA NA
7 1011632 46985840 PERITUS SOFTWARE SERVICES INC JACOR COMMUNICATIONS INC NA NA NA
8 1363589 55035110 CARILION CLINIC LUNA INNOVATIONS INC NA NA NA
9 1363589 55035110 CARILION HEALTH SYSTEM LUNA INNOVATIONS INC NA NA NA
So it seems that the 8-digit cusips can also provide substantial value by augmenting the data amongst 9-digit cusips (which have the correct check digit), though there are a small number of wrong matches to look out for.
@bdcallen I simply can't process this much information. Can you please pull it together into one or two comments so that we can decide what to do here.
I think indeed we want to know how many CUSIP-CIKs combinations get pushed over the threshold (e.g., 10 filings) by adding 8-digit CUSIPs to 9-digit CUSIPs. We may also want to consider how many cases then create issues along the lines considered in #90.
@bdcallen As I mentioned in May, it's hard for me to process this much detail. As far as I can tell we have about 180 firms (now that I'm using cusip_cik_test
as the starting point rather than cusip_cik
). I think you have done some analysis of potential matches and identified ones that look good (and ones that don't). Rather than trying to "brute force" this will clever coding, the easiest approach is probably to create a spreadsheet with the candidate matches and a classification into "good" and "bad". We could then use that table to override data about eight-digit matches found in the "raw" cusip_cik
when making cusip_cik_test
(as I am calling it for the moment).
I earlier made a Google Sheets document that could be used to that end. You could populate that in a number of ways (often I have just exported using readr::write_csv
and then copy-pasted). It might make sense to do something similar with seven-digit CUSIPs.
@iangow Like I did with my previous post in #89, I'm going to discuss some detail of what I did, and then just state where you can find my result. Further down in my notebook handle_cusip_cik_exceptions.ipynb
, you can find some cells defining dataframes
added_value_8s_above_10_w_issuers
, added_value_8s_above_10_stocknames_only
, added_value_8s_above_10_padded_w_issuers
and added_value_8s_above_10_padded_stocknames
. The dataframes containing "issuers" in the name are those that join onto cusipm.issuer
, and those containing "stocknames" are those that join onto crsp.stocknames
but not cusipm.issuer
. The latter two dataframes with "padded" in the name are where we set cusip
to be the cusip_raw
but padded with a zero on the left; ie. if the cusip_raw
is '95825910', then cusip
is '095825910'. I didn't bother with approximate string matching for all these cases, as the number of cases to consider was low (so sim_index_norm and sim_index_max are null for these entries).
You can find the result for the unpadded cusips using
SELECT * FROM edgar.cusip_cik_exceptions
WHERE LENGTH(cusip_raw) = 8 AND cusip = cusip_raw
and the result for the padded cusips using
SELECT * FROM edgar.cusip_cik_exceptions
WHERE LENGTH(cusip_raw) = 8 AND cusip != cusip_raw
I believe we can close this issue now.
@iangow As discussed in #84, this issue is for analysing cusips with 8 characters.