Closed iangow closed 4 years ago
@iangow Looks like there are a bunch of repeat offenders, which file their own cusips, not those of the security in which they're claiming beneficial ownership, with Wachovia Corp, the example I gave in another issue, right at the top
View(cusip_cik %>% filter(cusip_length == 9) %>%
filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
distinct(cik, cusip) %>%
group_by(cusip) %>% summarise(num_ciks = n()) %>%
arrange(desc(num_ciks)) %>% mutate(ncusip = substr(cusip, 1, 8)) %>%
inner_join(stocknames, by = 'ncusip') %>%
distinct(cusip.x, num_ciks, comnam) %>% rename(cusip = cusip.x) %>% slice(1:20)
cusip | num_ciks | comnam | |
---|---|---|---|
1 | 929903102 | 251 | WACHOVIA CORP 2ND NEW |
2 | 337358105 | 151 | FIRST UNION CORP |
3 | 769667106 | 118 | RIVUS BOND FUND |
4 | 053469102 | 82 | AVALON PROPERTIES INC |
5 | 94856P102 | 44 | WEEKS CORP |
6 | 253798102 | 39 | DIGI INTERNATIONAL INC |
7 | 232109108 | 26 | CUTERA INC |
8 | 871607107 | 23 | SYNOPSYS INC |
9 | 320228109 | 22 | FIRST FINANCIAL FUND INC |
10 | 693315103 | 22 | P H C INC MA |
11 | 358433100 | 21 | FRIEDMAN BILLINGS RAMSEY GRP IN |
12 | 521863308 | 20 | LEAP WIRELESS INTL INC |
13 | 03232P405 | 18 | AMSURG CORP |
14 | 75966V105 | 17 | RENAISSANCE CAP GR & INC FD III |
15 | 957070105 | 17 | WESTAFF INC |
16 | 749651105 | 16 | R O C TAIWAN FUND |
17 | 358434108 | 15 | FRIEDMAN BILLINGS RAMSEY GRP NE |
18 | 02043Q107 | 13 | ALNYLAM PHARMACEUTICALS INC |
19 | 435763107 | 13 | HOLLY ENERGY PARTNERS L P |
20 | 775371107 | 13 | ROHM & HAAS CO |
@iangow
cusip_cik %>% filter(cusip6 == '73935X' & company_name == 'POWERSHARES EXCHANGE TRADED FUND TRUST') %>%
distinct(file_name) %>% inner_join(cusip_cik) %>%
select(file_name, cik, cusip, cusip6, multiplicity, check_digit, company_name, formats) %>% View()
Looking at the first few from this view we get
file_name | cik | cusip | cusip6 | multiplicity | check_digit | company_name | formats | |
---|---|---|---|---|---|---|---|---|
1 | edgar/data/1209466/0000072971-09-000028.txt | 1209466 | 73935X146 | 73935X | 6 | 6 | POWERSHARES EXCHANGE TRADED FUND TRUST | ABC |
2 | edgar/data/1209466/0000072971-09-000575.txt | 1209466 | 73935X146 | 73935X | 6 | 6 | POWERSHARES EXCHANGE TRADED FUND TRUST | ABC |
3 | edgar/data/1209466/0000072971-10-000010.txt | 1209466 | 73935X146 | 73935X | 6 | 6 | POWERSHARES EXCHANGE TRADED FUND TRUST | ABC |
4 | edgar/data/72971/0000072971-09-000028.txt | 1209466 | 73935X146 | 73935X | 6 | 6 | POWERSHARES EXCHANGE TRADED FUND TRUST | ABC |
5 | edgar/data/72971/0000072971-10-000010.txt | 1209466 | 73935X146 | 73935X | 6 | 6 | POWERSHARES EXCHANGE TRADED FUND TRUST | ABC |
6 | edgar/data/72971/0000072971-09-000575.txt | 1209466 | 73935X146 | 73935X | 6 | 6 | POWERSHARES EXCHANGE TRADED FUND TRUST | ABC |
7 | edgar/data/1209466/0001074683-08-000118.txt | 1209466 | 73935X153 | 73935X | 8 | 3 | POWERSHARES EXCHANGE TRADED FUND TRUST | AB |
8 | edgar/data/1209466/0001074683-08-000118.txt | 1209466 | 769667106 | 769667 | 256 | 6 | POWERSHARES EXCHANGE TRADED FUND TRUST | C |
9 | edgar/data/1209466/0001074683-08-000118.txt | 1209466 | 929903102 | 929903 | 324 | 2 | POWERSHARES EXCHANGE TRADED FUND TRUST | D |
10 | edgar/data/36995/0001074683-08-000118.txt | 1209466 | 73935X153 | 73935X | 8 | 3 | POWERSHARES EXCHANGE TRADED FUND TRUST | AB |
11 | edgar/data/36995/0001074683-08-000118.txt | 1209466 | 769667106 | 769667 | 256 | 6 | POWERSHARES EXCHANGE TRADED FUND TRUST | C |
12 | edgar/data/36995/0001074683-08-000118.txt | 1209466 | 929903102 | 929903 | 324 | 2 | POWERSHARES EXCHANGE TRADED FUND TRUST | D |
13 | edgar/data/1209466/0001074683-08-000447.txt | 1209466 | 73935X153 | 73935X | 8 | 3 | POWERSHARES EXCHANGE TRADED FUND TRUST | AB |
14 | edgar/data/1209466/0001074683-08-000447.txt | 1209466 | 769667106 | 769667 | 256 | 6 | POWERSHARES EXCHANGE TRADED FUND TRUST | C |
15 | edgar/data/1209466/0001074683-08-000447.txt | 1209466 | 929903102 | 929903 | 324 | 2 | POWERSHARES EXCHANGE TRADED FUND TRUST | D |
16 | edgar/data/36995/0001074683-08-000447.txt | 1209466 | 73935X153 | 73935X | 8 | 3 | POWERSHARES EXCHANGE TRADED FUND TRUST | AB |
17 | edgar/data/36995/0001074683-08-000447.txt | 1209466 | 769667106 | 769667 | 256 | 6 | POWERSHARES EXCHANGE TRADED FUND TRUST | C |
18 | edgar/data/36995/0001074683-08-000447.txt | 1209466 | 929903102 | 929903 | 324 | 2 | POWERSHARES EXCHANGE TRADED FUND TRUST | D |
19 | edgar/data/36995/0001074683-08-000556.txt | 1209466 | 73935X153 | 73935X | 8 | 3 | POWERSHARES EXCHANGE TRADED FUND TRUST | AB |
20 | edgar/data/36995/0001074683-08-000556.txt | 1209466 | 769667106 | 769667 | 256 | 6 | POWERSHARES EXCHANGE TRADED FUND TRUST | C |
21 | edgar/data/36995/0001074683-08-000556.txt | 1209466 | 929903102 | 929903 | 324 | 2 | POWERSHARES EXCHANGE TRADED FUND TRUST | D |
We can see one of the filings, edgar/data/1209466/0001074683-08-000118.txt has three cusips, 73935X153
, 769667106
and the cusip for Wachovia Corp, 929903102
. 73935X153
is the correct one
> stocknames %>% filter(ncusip == '73935X15')
# A tibble: 3 x 16
permno permco namedt nameenddt cusip ncusip ticker comnam hexcd exchcd siccd shrcd shrcls st_date end_date namedum
<int> <int> <date> <date> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <date> <date> <dbl>
1 91876 44072 2007-03-01 2007-12-30 46137V83 73935X15 PDP POWERSHARES E T F TRUST 3 1 6726 73 NA 2007-03-30 2019-12-31 2
2 91876 44072 2007-12-31 2016-12-06 46137V83 73935X15 PDP POWERSHARES E T F TRUST 3 4 6726 73 NA 2007-03-30 2019-12-31 2
3 91876 44072 2016-12-07 2018-06-03 46137V83 73935X15 PDP POWERSHARES E T F TRUST 3 3 6726 73 NA 2007-03-30 2019-12-31 2
> issuers %>% filter(cusip6 == '73935X') %>% select(cusip6, issuer_name_1, issuer_name_2)
# A tibble: 1 x 3
cusip6 issuer_name_1 issuer_name_2
<chr> <chr> <chr>
1 73935X POWERSHARES EXCHANGE-TRADED FD TR
and the other cusip, 769667106
, is for a different company, RIVUS BOND FUND
.
> stocknames %>% filter(ncusip == '76966710')
# A tibble: 1 x 16
permno permco namedt nameenddt cusip ncusip ticker comnam hexcd exchcd siccd shrcd shrcls st_date end_date namedum
<int> <int> <date> <date> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <date> <date> <dbl>
1 54228 20601 2006-10-10 2011-12-08 45781W10 76966710 BDF RIVUS BOND FUND 1 1 6726 14 NA 1972-02-29 2019-12-31 2
The reason why the cusip for Rivus Bond Fund would appear is still somewhat mysterious to me. It clearly doesn't share the same cik as for Powershare though
> ciks %>% filter(company_name == 'RIVUS BOND FUND')
# A tibble: 1 x 2
cik company_name
<int> <chr>
1 30125 RIVUS BOND FUND
> ciks %>% filter(company_name == 'POWERSHARES EXCHANGE TRADED FUND TRUST')
# A tibble: 1 x 2
cik company_name
<int> <chr>
1 1209466 POWERSHARES EXCHANGE TRADED FUND TRUST
@iangow It's seem multiplicity doesn't help with with Rivus, either by looking at ciks for the one cusip,
cusip_cik %>% distinct(cusip, cik, multiplicity) %>% filter(cusip == '769667106') %>% left_join(ciks) %>% arrange(desc(multiplicity)) %>% View()
cusip | cik | multiplicity | company_name | |
---|---|---|---|---|
1 | 769667106 | 1209466 | 256 | INVESCO EXCHANGE-TRADED FUND TRUST |
2 | 769667106 | 1209466 | 256 | POWERSHARES EXCHANGE TRADED FUND TRUST |
3 | 769667106 | 1100663 | 168 | ISHARES TRUST |
4 | 769667106 | 1064642 | 92 | INDEX EXCHANGE LISTED SECURITIES TRUST |
5 | 769667106 | 1064642 | 92 | SPDR SERIES TRUST |
6 | 769667106 | 1064642 | 92 | STREETTRACKS SERIES TRUST |
7 | 769667106 | 52848 | 76 | VANGUARD WORLD FUND INC |
8 | 769667106 | 52848 | 76 | VANGUARD WORLD FUND |
9 | 769667106 | 52848 | 76 | VANGUARD WORLD FUNDS |
10 | 769667106 | 1350487 | 70 | WISDOMTREE TRUST |
11 | 769667106 | 1064641 | 46 | SECTOR SPDR TRUST |
12 | 769667106 | 1064641 | 46 | SELECT SECTOR SPDR TRUST |
13 | 769667106 | 930667 | 42 | FOREIGN FUND INC |
14 | 769667106 | 930667 | 42 | ISHARES INC |
15 | 769667106 | 930667 | 42 | ISHARES, INC. |
16 | 769667106 | 930667 | 42 | WEBS INDEX FUND INC |
17 | 769667106 | 930667 | 42 | WFBS INDEX FUND INC |
18 | 769667106 | 1378872 | 40 | INVESCO EXCHANGE-TRADED FUND TRUST II |
19 | 769667106 | 1378872 | 40 | POWERSHARES EXCHANGE-TRADED FUND TRUST II |
20 | 769667106 | 1378872 | 40 | POWERSHARES GLOBAL EXCHANGE-TRADED FUND TRUST |
21 | 769667106 | 1125816 | 38 | FIRST TRUST ADVISORS LP |
22 | 769667106 | 893818 | 34 | BARCLAYS GLOBAL INVESTORS FUNDS INC |
23 | 769667106 | 893818 | 34 | BARCLAYS GLOBAL INVESTORS FUNDS |
24 | 769667106 | 893818 | 34 | BLACKROCK FUNDS III |
25 | 769667106 | 893818 | 34 | MASTERWORKS FUNDS INC |
26 | 769667106 | 893818 | 34 | STAGECOACH INC |
27 | 769667106 | 30125 | 30 | 1838 BOND DEBENTURE TRADING FUND |
28 | 769667106 | 30125 | 30 | CUTWATER SELECT INCOME FUND |
29 | 769667106 | 30125 | 30 | INSIGHT SELECT INCOME FUND |
30 | 769667106 | 30125 | 30 | RIVUS BOND FUND |
or cusips for the one cik
> cusip_cik %>% distinct(cusip, cik, multiplicity) %>% filter(cik == 30125) %>% arrange(desc(multiplicity))
# A tibble: 5 x 3
cusip cik multiplicity
<chr> <int> <int>
1 232229104 30125 34
2 769667106 30125 30
3 45781W109 30125 16
4 282520105 30125 14
5 929903102 30125 10
@iangow Out of the 5 cusips in the latter part of the previous post, 4 join onto issuers, and three of them are related to Rivus by name changes. The remaining one, 45781W109
, joins onto stocknames, and seems to be an older cusip
> cusip_cik %>% distinct(cusip, cik, multiplicity) %>%
filter(cik == 30125) %>% arrange(desc(multiplicity)) %>%
mutate(cusip6 = substr(cusip, 1, 6)) %>% inner_join(issuers) %>%
select(cusip, cik, multiplicity, issuer_name_1, issuer_adl_1, issuer_adl_2, issuer_adl_3)
Joining, by = "cusip6"
# A tibble: 4 x 7
cusip cik multiplicity issuer_name_1 issuer_adl_1 issuer_adl_2 issuer_adl_3
<chr> <int> <int> <chr> <chr> <chr> <chr>
1 232229104 30125 34 CUTWATER SELECT INCOME FD NA NA NA
2 769667106 30125 30 RIVUS BD FD NAME CHANGED TO CUTWATER SELECT INCOME FD 12/09/2011 SEE 232229
3 282520105 30125 14 1838 BD-DEB TRADING FD NAME CHANGED TO RIVUS BD FD 07/07/2006 SEE 769667 NA
4 929903102 30125 10 WACHOVIA CORP NEW MERGED INTO WELLS FARGO & CO NEW 01/01/2009 NA
> stocknames %>% filter(ncusip == '45781W10')
# A tibble: 1 x 16
permno permco namedt nameenddt cusip ncusip ticker comnam hexcd exchcd siccd shrcd shrcls st_date end_date namedum
<int> <int> <date> <date> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <date> <date> <dbl>
1 54228 20601 2016-12-29 2019-12-31 45781W10 45781W10 INSI INSIGHT SELECT INCOME FD 1 1 6726 14 NA 1972-02-29 2019-12-31 2
> stocknames %>% filter(ncusip == '45781W10' | cusip == '45781W10')
# A tibble: 5 x 16
permno permco namedt nameenddt cusip ncusip ticker comnam hexcd exchcd siccd shrcd shrcls st_date end_date namedum
<int> <int> <date> <date> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <date> <date> <dbl>
1 54228 20601 1972-02-22 1988-10-06 45781W10 26175510 DBF DREXEL BOND DEBENTURE TRADING F 1 1 6723 14 NA 1972-02-29 2019-12-31 2
2 54228 20601 1988-10-07 2006-10-09 45781W10 28252010 BDF 1838 BOND DEBENTURE TRADING FUN 1 1 6726 14 NA 1972-02-29 2019-12-31 2
3 54228 20601 2006-10-10 2011-12-08 45781W10 76966710 BDF RIVUS BOND FUND 1 1 6726 14 NA 1972-02-29 2019-12-31 2
4 54228 20601 2011-12-09 2016-12-28 45781W10 23222910 CSI CUTWATER SELECT INCOME FD 1 1 6726 14 NA 1972-02-29 2019-12-31 2
5 54228 20601 2016-12-29 2019-12-31 45781W10 45781W10 INSI INSIGHT SELECT INCOME FD 1 1 6726 14 NA 1972-02-29 2019-12-31 2
So there is a part to this issue that some companies change their cusip6 identifier over time.
Here's further confirmation that the other four names are valid historical names for the given cik
> ciks %>% filter(cik == 30125)
# A tibble: 4 x 2
cik company_name
<int> <chr>
1 30125 1838 BOND DEBENTURE TRADING FUND
2 30125 CUTWATER SELECT INCOME FUND
3 30125 INSIGHT SELECT INCOME FUND
4 30125 RIVUS BOND FUND
Noting that the sum of the multiplicities for the valid cusips associated with the valid names is 34 + 30 + 16 + 14 = 94
, versus 10 for Wachovia Corp, perhaps multiplicity should help us for less complicated cases.
Looks like with simpler cases (in this example, num_ciks = 2
), multiplicity is handy. There does seem to be the odd cases where the order of magnitude of the difference in multiplicity between ciks is not that great (eg. cusip = 000573105
)
> cusip_cik %>% filter(cusip_length == 9) %>%
filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
distinct(cik, cusip) %>%
group_by(cusip) %>% summarise(num_ciks = n()) %>%
arrange(desc(num_ciks)) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>%
inner_join(issuers) %>% filter(num_ciks == 2) %>%
inner_join(cusip_cik) %>% left_join(ciks) %>%
distinct(cusip, cik, num_ciks, multiplicity, company_name,
issuer_name_1, issuer_name_2, issuer_name_3) %>%
arrange(cusip) %>% head(20)
Joining, by = "cusip6"
Joining, by = c("cusip", "cusip6")
Joining, by = c("cik", "company_name")
# A tibble: 20 x 8
cusip cik num_ciks multiplicity company_name issuer_name_1 issuer_name_2 issuer_name_3
<chr> <int> <int> <int> <chr> <chr> <chr> <chr>
1 000361105 1750 2 320 AAR CORP AAR CORP NA NA
2 000361105 1141107 2 2 ARRIS GROUP INC AAR CORP NA NA
3 000573105 704051 2 3 LEGG MASON INC ABC DISPENSING TECHNOLOGIES INC NA
4 000573105 748103 2 6 ABC DISPENSING TECHNOLOGIES INC ABC DISPENSING TECHNOLOGIES INC NA
5 000752105 913364 2 68 ABC NACO INC ABC RAIL PRODS CORP NA NA
6 000752105 913364 2 68 ABC RAIL PRODUCTS CORP ABC RAIL PRODS CORP NA NA
7 000752105 923364 2 2 H&N MANAGEMENT CO INC ABC RAIL PRODS CORP NA NA
8 000781104 313368 2 10 ABS INDUSTRIES INC /DE/ ABS INDS INC NA NA
9 000781104 806063 2 2 ABS LIFESCIENCES INC ABS INDS INC NA NA
10 00080S101 845289 2 2 HARVEST NATURAL RESOURCES, INC. ABX HLDGS INC NA NA
11 00080S101 894081 2 56 ABX AIR INC ABX HLDGS INC NA NA
12 00080S101 894081 2 56 ABX Holdings, Inc. ABX HLDGS INC NA NA
13 000881102 1040792 2 56 A CONSULTING TEAM INC A CONSULTING TEAM INC NA NA
14 000881102 1363448 2 2 Helios & Matheson Information Technology Ltd. A CONSULTING TEAM INC NA NA
15 000955104 215155 2 2 ASTROCOM CORP ACT TELECONFERENCING INC NA NA
16 000955104 918709 2 81 ACT TELECONFERENCING INC ACT TELECONFERENCING INC NA NA
17 000957100 356037 2 2 CSP INC /MA/ ABM INDS INC NA NA
18 000957100 771497 2 226 ABM INDUSTRIES INC /DE/ ABM INDS INC NA NA
19 000975102 316236 2 2 GRAND UNION CO /DE/ ACT NETWORKS INC NA NA
20 000975102 942132 2 32 ACT NETWORKS INC ACT NETWORKS INC NA NA
Here's the same analysis for num_ciks = 3
> cusip_cik %>% filter(cusip_length == 9) %>%
+ filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
+ distinct(cik, cusip) %>%
+ group_by(cusip) %>% summarise(num_ciks = n()) %>%
+ arrange(desc(num_ciks)) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ inner_join(issuers) %>% filter(num_ciks == 3) %>% inner_join(cusip_cik) %>% left_join(ciks) %>% distinct(cusip, cik, num_ciks, multiplicity, company_name, issuer_name_1, issuer_name_2, issuer_name_3) %>% arrange(cusip) %>% head(20)
Joining, by = "cusip6"
Joining, by = c("cusip", "cusip6")
Joining, by = c("cik", "company_name")
# A tibble: 20 x 8
cusip cik num_ciks multiplicity company_name issuer_name_1 issuer_name_2 issuer_name_3
<chr> <int> <int> <int> <chr> <chr> <chr> <chr>
1 000360206 105634 3 2 EMCOR GROUP INC AAON INC NA NA
2 000360206 706688 3 2 AARON'S INC AAON INC NA NA
3 000360206 824142 3 142 AAON INC AAON INC NA NA
4 00086T103 787030 3 2 VISTACARE, INC. A C MOORE ARTS & CRAFTS INC NA NA
5 00086T103 1042809 3 196 A C MOORE ARTS & CRAFTS INC A C MOORE ARTS & CRAFTS INC NA NA
6 00086T103 1042809 3 196 A.C. Moore Arts & Crafts, Inc. A C MOORE ARTS & CRAFTS INC NA NA
7 00086T103 1385534 3 1 TIGER CONSUMER MANAGEMENT, LLC A C MOORE ARTS & CRAFTS INC NA NA
8 00104Q107 1024441 3 2 FACTORY CARD OUTLET CORP AFC ENTERPRISES INC NA NA
9 00104Q107 1041379 3 226 AFC ENTERPRISES INC AFC ENTERPRISES INC NA NA
10 00104Q107 1041379 3 226 POPEYES LOUISIANA KITCHEN, INC. AFC ENTERPRISES INC NA NA
11 00104Q107 1491501 3 2 TMS International Corp. AFC ENTERPRISES INC NA NA
12 001250109 875354 3 24 AG SERVICES OF AMERICA INC AG SVCS AMER INC NA NA
13 001250109 1077552 3 2 GENTEK INC AG SVCS AMER INC NA NA
14 001250109 1099517 3 2 PORT FINANCIAL CORP AG SVCS AMER INC NA NA
15 001282102 356080 3 16 BELO CORP A H BELO CORP NA NA
16 001282102 1413898 3 162 A. H. Belo CORP A H BELO CORP NA NA
17 001282102 1413898 3 162 A. H. Belo Corp A H BELO CORP NA NA
18 001282102 1423296 3 2 Zuckerman Investment Group, LLC A H BELO CORP NA NA
19 001296102 812900 3 2 SAKS INC AHL SVCS INC NA NA
20 001296102 878720 3 4 FAO INC AHL SVCS INC NA NA
and num_ciks = 4
> cusip_cik %>% filter(cusip_length == 9) %>%
+ filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
+ distinct(cik, cusip) %>%
+ group_by(cusip) %>% summarise(num_ciks = n()) %>%
+ arrange(desc(num_ciks)) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ inner_join(issuers) %>% filter(num_ciks == 4) %>% inner_join(cusip_cik) %>% left_join(ciks) %>% distinct(cusip, cik, num_ciks, multiplicity, company_name, issuer_name_1, issuer_name_2, issuer_name_3) %>% arrange(cusip) %>% head(20)
Joining, by = "cusip6"
Joining, by = c("cusip", "cusip6")
Joining, by = c("cik", "company_name")
# A tibble: 20 x 8
cusip cik num_ciks multiplicity company_name issuer_name_1 issuer_name_2 issuer_name_3
<chr> <int> <int> <int> <chr> <chr> <chr> <chr>
1 001947100 49899 4 4 IMPERIAL BANCORP AT PLASTICS INC NA NA
2 001947100 225628 4 4 MEGADATA CORP AT PLASTICS INC NA NA
3 001947100 1011694 4 16 AT PLASTICS INC AT PLASTICS INC NA NA
4 001947100 1012616 4 2 BANK PLUS CORP AT PLASTICS INC NA NA
5 003881307 93456 4 2 STANDARD REGISTER CO ACACIA RESH CORP NA NA
6 003881307 93459 4 2 STANDARD RESOURCES CORP ACACIA RESH CORP NA NA
7 003881307 934549 4 164 ACACIA RESEARCH CORP ACACIA RESH CORP NA NA
8 003881307 1500866 4 2 InterXion Holding N.V. ACACIA RESH CORP NA NA
9 004816104 2098 4 130 ACME UNITED CORP ACME UTD CORP NA NA
10 004816104 732847 4 2 FIRST WILSHIRE SECURITIES MANAGEMENT INC ACME UTD CORP NA NA
11 004816104 1071625 4 4 DOVER SADDLERY INC ACME UTD CORP NA NA
12 004816104 1342857 4 1 North Star Investment Management Corp. ACME UTD CORP NA NA
13 004931101 350846 4 2 SUPREME INDUSTRIES INC ACTRADE INTL LTD NA NA
14 004931101 704508 4 2 MOTO PHOTO INC ACTRADE INTL LTD NA NA
15 004931101 819255 4 23 ACTRADE INTERNATIONAL LTD ACTRADE INTL LTD NA NA
16 004931101 819255 4 23 ACTRADE FINANCIAL TECHNOLOGIES LTD ACTRADE INTL LTD NA NA
17 004931101 938206 4 1 DRIEHAUS CAPITAL MANAGEMENT INC ACTRADE INTL LTD NA NA
18 00506J107 1047188 4 2 PENWEST PHARMACEUTICALS CO ACTIVCARD CORP NA NA
19 00506J107 1183941 4 14 ACTIVCARD CORP ACTIVCARD CORP NA NA
20 00506J107 1202157 4 2 BROOKFIELD HOMES CORP ACTIVCARD CORP NA NA
So choosing only the rows for num_ciks = 5
where the multiplicity is equal to the maximum grouped by cusip, I got
> cusip_cik %>% filter(cusip_length == 9) %>%
+ filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
+ distinct(cik, cusip) %>%
+ group_by(cusip) %>% summarise(num_ciks = n()) %>%
+ arrange(desc(num_ciks)) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ inner_join(issuers) %>% filter(num_ciks == 5) %>% inner_join(cusip_cik) %>% left_join(ciks) %>% distinct(cusip, cik, num_ciks, multiplicity, company_name, issuer_name_1, issuer_name_2, issuer_name_3) %>% arrange(cusip) %>% group_by(cusip) %>% filter(multiplicity == max(multiplicity)) %>% ungroup() %>% head(20)
Joining, by = "cusip6"
Joining, by = c("cusip", "cusip6")
Joining, by = c("cik", "company_name")
# A tibble: 20 x 8
cusip cik num_ciks multiplicity company_name issuer_name_1 issuer_name_2 issuer_name_3
<chr> <int> <int> <int> <chr> <chr> <chr> <chr>
1 001084102 880266 5 238 AGCO CORP /DE AGCO CORP DEL NA NA
2 00724F101 796343 5 212 ADOBE SYSTEMS INC ADOBE SYS INC NA NA
3 00724F101 796343 5 212 ADOBE INC. ADOBE SYS INC NA NA
4 00763M108 1168335 5 132 ADVANCED MEDICAL OPTICS INC ADVANCED MED OPTICS INC NA NA
5 00763M108 1168335 5 132 ABBOTT MEDICAL OPTICS INC ADVANCED MED OPTICS INC NA NA
6 034918102 1096481 5 94 ANGIOTECH PHARMACEUTICALS INC ANGIOTECH PHARMACEUTICALS INC NA NA
7 039381108 915390 5 54 ARCH COMMUNICATIONS GROUP INC /DE/ ARCH COMMUNICATIONS GROUP INC NA NA
8 039381108 915390 5 54 ARCH WIRELESS INC ARCH COMMUNICATIONS GROUP INC NA NA
9 048426100 1404296 5 92 Atlantic Coast Financial CORP ATLANTIC COAST FINL CORP NA NA
10 048798102 948975 5 104 ATLANTIC REALTY TRUST ATLANTIC RLTY TR NA NA
11 065908501 921768 5 62 BANKATLANTIC BANCORP INC BANKATLANTIC BANCORP INC NA NA
12 105368104 790816 5 30 BRANDYWINE REALTY TRUST BRANDYWINE RLTY TR NA NA
13 109043109 14195 5 234 BRIGGS & STRATTON CORP BRIGGS & STRATTON CORP NA NA
14 120113105 1046356 5 144 BUILDING MATERIALS HOLDING CORP BUILDING MATLS HLDG CORP NA NA
15 120113105 1046356 5 144 BUILDING MATERIALS HOLDING Corp BUILDING MATLS HLDG CORP NA NA
16 125919308 889348 5 152 CPI AEROSTRUCTURES INC CPI AEROSTRUCTURES INC NA NA
17 127537207 727273 5 220 CADIZ INC CADIZ INC NA NA
18 13123X102 928022 5 230 CALLON PETROLEUM CO CALLON PETE CO DEL NA NA
19 13123X102 928022 5 230 Callon Petroleum Co CALLON PETE CO DEL NA NA
20 141665109 1046568 5 332 CAREER EDUCATION CORP CAREER ED CORP NA NA
> cusip_cik %>% filter(cusip_length == 9) %>%
+ filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
+ distinct(cik, cusip) %>%
+ group_by(cusip) %>% summarise(num_ciks = n()) %>%
+ arrange(desc(num_ciks)) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ inner_join(issuers) %>% filter(num_ciks == 5) %>% inner_join(cusip_cik) %>% left_join(ciks) %>% distinct(cusip, cik, num_ciks, multiplicity, company_name, issuer_name_1, issuer_name_2, issuer_name_3) %>% arrange(cusip) %>% group_by(cusip) %>% filter(multiplicity == max(multiplicity)) %>% ungroup() %>% distinct(cusip, cik) %>% count()
Joining, by = "cusip6"
Joining, by = c("cusip", "cusip6")
Joining, by = c("cik", "company_name")
# A tibble: 1 x 1
n
<int>
1 76
> cusip_cik %>% filter(cusip_length == 9) %>%
+ filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
+ distinct(cik, cusip) %>%
+ group_by(cusip) %>% summarise(num_ciks = n()) %>%
+ arrange(desc(num_ciks)) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ inner_join(issuers) %>% filter(num_ciks == 5) %>% inner_join(cusip_cik) %>% left_join(ciks) %>% distinct(cusip, cik, num_ciks, multiplicity, company_name, issuer_name_1, issuer_name_2, issuer_name_3) %>% arrange(cusip) %>% group_by(cusip) %>% filter(multiplicity == max(multiplicity)) %>% ungroup() %>% distinct(cik) %>% count()
Joining, by = "cusip6"
Joining, by = c("cusip", "cusip6")
Joining, by = c("cik", "company_name")
# A tibble: 1 x 1
n
<int>
1 76
> cusip_cik %>% filter(cusip_length == 9) %>%
+ filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
+ distinct(cik, cusip) %>%
+ group_by(cusip) %>% summarise(num_ciks = n()) %>%
+ arrange(desc(num_ciks)) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ inner_join(issuers) %>% filter(num_ciks == 5) %>% inner_join(cusip_cik) %>% left_join(ciks) %>% distinct(cusip, cik, num_ciks, multiplicity, company_name, issuer_name_1, issuer_name_2, issuer_name_3) %>% arrange(cusip) %>% group_by(cusip) %>% filter(multiplicity == max(multiplicity)) %>% ungroup() %>% distinct(cusip) %>% count()
Joining, by = "cusip6"
Joining, by = c("cusip", "cusip6")
Joining, by = c("cik", "company_name")
# A tibble: 1 x 1
n
<int>
1 75
The numbers at the end tell us that there is a single cusip in this case which still has two ciks mapped to it. Viewing the entire dataframe, I found it to be 68274T102
> cusip_cik %>% filter(cusip_length == 9) %>%
+ filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
+ distinct(cik, cusip) %>%
+ group_by(cusip) %>% summarise(num_ciks = n()) %>%
+ arrange(desc(num_ciks)) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ inner_join(issuers) %>% filter(num_ciks == 5) %>% inner_join(cusip_cik) %>% left_join(ciks) %>% distinct(cusip, cik, num_ciks, multiplicity, company_name, issuer_name_1, issuer_name_2, issuer_name_3) %>% arrange(cusip) %>% group_by(cusip) %>% filter(multiplicity == max(multiplicity)) %>% ungroup() %>% filter(cusip == '68274T102')
Joining, by = "cusip6"
Joining, by = c("cusip", "cusip6")
Joining, by = c("cik", "company_name")
# A tibble: 2 x 8
cusip cik num_ciks multiplicity company_name issuer_name_1 issuer_name_2 issuer_name_3
<chr> <int> <int> <int> <chr> <chr> <chr> <chr>
1 68274T102 1158694 5 4 ONLINE HOLDINGS INC ONLINE HLDGS INC NA NA
2 68274T102 1388320 5 4 Cactus Ventures, Inc. ONLINE HLDGS INC NA NA
> ciks %>% filter(cik %in% c(1158694, 1388320))
# A tibble: 5 x 2
cik company_name
<int> <chr>
1 1388320 ACTINIUM PHARMACEUTICALS, INC.
2 1388320 CACTUS VENTURES, INC.
3 1158694 EFACTOR GROUP CORP.
4 1158694 ONLINE HOLDINGS INC
5 1158694 STANDARD DRILLING, INC.
Note that the multiplicity for this case for both rows is very low (imposing a multiplicity of 10 would wipe these out). One of them still looks to be the correct one, however, looking at the names.
I think this case suggests that if the multiplicity is low (in which case choosing multiplicity >= 10 helps alleviate), or if the order of magnitude of multiplicities is similar (in which case choosing multiplicity >= 10 may not necessarily help, as seemed to be the case with Rivus Bond Fund), that choosing the maximum multiplicity row or rows may not always single out the correct cik.
Repeating for num_ciks = 2
> cusip_cik %>% filter(cusip_length == 9) %>%
+ filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
+ distinct(cik, cusip) %>%
+ group_by(cusip) %>% summarise(num_ciks = n()) %>%
+ arrange(desc(num_ciks)) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ inner_join(issuers) %>% filter(num_ciks == 2) %>% inner_join(cusip_cik) %>% left_join(ciks) %>% distinct(cusip, cik, num_ciks, multiplicity, company_name, issuer_name_1, issuer_name_2, issuer_name_3) %>% arrange(cusip) %>% group_by(cusip) %>% filter(multiplicity == max(multiplicity)) %>% ungroup() %>% head(20)
Joining, by = "cusip6"
Joining, by = c("cusip", "cusip6")
Joining, by = c("cik", "company_name")
# A tibble: 20 x 8
cusip cik num_ciks multiplicity company_name issuer_name_1 issuer_name_2 issuer_name_3
<chr> <int> <int> <int> <chr> <chr> <chr> <chr>
1 000361105 1750 2 320 AAR CORP AAR CORP NA NA
2 000573105 748103 2 6 ABC DISPENSING TECHNOLOGIES INC ABC DISPENSING TECHNOLOGIES INC NA
3 000752105 913364 2 68 ABC NACO INC ABC RAIL PRODS CORP NA NA
4 000752105 913364 2 68 ABC RAIL PRODUCTS CORP ABC RAIL PRODS CORP NA NA
5 000781104 313368 2 10 ABS INDUSTRIES INC /DE/ ABS INDS INC NA NA
6 00080S101 894081 2 56 ABX AIR INC ABX HLDGS INC NA NA
7 00080S101 894081 2 56 ABX Holdings, Inc. ABX HLDGS INC NA NA
8 000881102 1040792 2 56 A CONSULTING TEAM INC A CONSULTING TEAM INC NA NA
9 000955104 918709 2 81 ACT TELECONFERENCING INC ACT TELECONFERENCING INC NA NA
10 000957100 771497 2 226 ABM INDUSTRIES INC /DE/ ABM INDS INC NA NA
11 000975102 942132 2 32 ACT NETWORKS INC ACT NETWORKS INC NA NA
12 001077106 842289 2 34 AG BAG INTERNATIONAL LTD AG-BAG INTL LTD NA NA
13 001077106 842289 2 34 AB HOLDING GROUP, INC. AG-BAG INTL LTD NA NA
14 001228105 1514281 2 56 AG Mortgage Investment Trust, Inc. AG MTG INVT TR INC NA NA
15 001303106 837038 2 6 AHPC Holdings, Inc. AHPC HLDGS INC NA NA
16 00164V103 1514991 2 150 AMC Networks Inc. AMC NETWORKS INC NA NA
17 001669100 722077 2 112 AMC ENTERTAINMENT INC AMC ENTMT INC NA NA
18 001669100 722077 2 112 AMC ENTERTAINMENT INC ET AL AMC ENTMT INC NA NA
19 001693100 894972 2 18 AMFM INC AMFM INC NA NA
20 001765106 6201 2 244 AMR CORP AMR CORP DEL NA NA
> cusip_cik %>% filter(cusip_length == 9) %>%
+ filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
+ distinct(cik, cusip) %>%
+ group_by(cusip) %>% summarise(num_ciks = n()) %>%
+ arrange(desc(num_ciks)) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ inner_join(issuers) %>% filter(num_ciks == 2) %>% inner_join(cusip_cik) %>% left_join(ciks) %>% distinct(cusip, cik, num_ciks, multiplicity, company_name, issuer_name_1, issuer_name_2, issuer_name_3) %>% arrange(cusip) %>% group_by(cusip) %>% filter(multiplicity == max(multiplicity)) %>% ungroup() %>% distinct(cusip, cik) %>% count()
Joining, by = "cusip6"
Joining, by = c("cusip", "cusip6")
Joining, by = c("cik", "company_name")
# A tibble: 1 x 1
n
<int>
1 4418
> cusip_cik %>% filter(cusip_length == 9) %>%
+ filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
+ distinct(cik, cusip) %>%
+ group_by(cusip) %>% summarise(num_ciks = n()) %>%
+ arrange(desc(num_ciks)) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ inner_join(issuers) %>% filter(num_ciks == 2) %>% inner_join(cusip_cik) %>% left_join(ciks) %>% distinct(cusip, cik, num_ciks, multiplicity, company_name, issuer_name_1, issuer_name_2, issuer_name_3) %>% arrange(cusip) %>% group_by(cusip) %>% filter(multiplicity == max(multiplicity)) %>% ungroup() %>% distinct(cik) %>% count()
Joining, by = "cusip6"
Joining, by = c("cusip", "cusip6")
Joining, by = c("cik", "company_name")
# A tibble: 1 x 1
n
<int>
1 4046
> cusip_cik %>% filter(cusip_length == 9) %>%
+ filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
+ distinct(cik, cusip) %>%
+ group_by(cusip) %>% summarise(num_ciks = n()) %>%
+ arrange(desc(num_ciks)) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ inner_join(issuers) %>% filter(num_ciks == 2) %>% inner_join(cusip_cik) %>% left_join(ciks) %>% distinct(cusip, cik, num_ciks, multiplicity, company_name, issuer_name_1, issuer_name_2, issuer_name_3) %>% arrange(cusip) %>% group_by(cusip) %>% filter(multiplicity == max(multiplicity)) %>% ungroup() %>% distinct(cusip) %>% count()
Joining, by = "cusip6"
Joining, by = c("cusip", "cusip6")
Joining, by = c("cik", "company_name")
# A tibble: 1 x 1
n
<int>
1 4269
> cusip_cik %>% filter(cusip_length == 9) %>%
+ filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
+ distinct(cik, cusip) %>%
+ group_by(cusip) %>% summarise(num_ciks = n()) %>%
+ arrange(desc(num_ciks)) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>%
+ inner_join(issuers) %>% filter(num_ciks == 2) %>% inner_join(cusip_cik) %>% left_join(ciks) %>% distinct(cusip, cik, num_ciks, multiplicity, company_name, issuer_name_1, issuer_name_2, issuer_name_3) %>% arrange(cusip) %>% group_by(cusip) %>% filter(multiplicity == max(multiplicity)) %>% ungroup() %>% group_by(cusip) %>% filter(length(unique(cik)) > 1) %>% ungroup() %>% group_by(multiplicity) %>% summarise(num = n()) %>% arrange(desc(num))
Joining, by = "cusip6"
Joining, by = c("cusip", "cusip6")
Joining, by = c("cik", "company_name")
# A tibble: 10 x 2
multiplicity num
<int> <int>
1 2 198
2 4 55
3 6 20
4 8 7
5 10 7
6 16 7
7 12 5
8 1 2
9 20 2
10 30 2
Sometimes companies change ciks over time as well, as this example demonstrates. That can also cause multiple ciks per cusip
> cusip_cik %>% filter(cusip_length == 9) %>%
filter(substr(cusip, 9, 9) == as.character(check_digit)) %>%
distinct(cik, cusip) %>%
group_by(cusip) %>% summarise(num_ciks = n()) %>%
arrange(desc(num_ciks)) %>% mutate(cusip6 = substr(cusip, 1, 6)) %>%
inner_join(issuers) %>% filter(num_ciks == 2) %>%
inner_join(cusip_cik) %>% left_join(ciks) %>%
distinct(cusip, cik, num_ciks, multiplicity, company_name,
issuer_name_1, issuer_name_2, issuer_name_3) %>%
arrange(cusip) %>% group_by(cusip) %>%
filter(multiplicity == max(multiplicity)) %>% ungroup() %>%
group_by(cusip) %>% filter(length(unique(cik)) > 1) %>%
ungroup() %>% filter(multiplicity == 30)
Joining, by = "cusip6"
Joining, by = c("cusip", "cusip6")
Joining, by = c("cik", "company_name")
# A tibble: 2 x 8
cusip cik num_ciks multiplicity company_name issuer_name_1 issuer_name_2 issuer_name_3
<chr> <int> <int> <int> <chr> <chr> <chr> <chr>
1 31410H101 34945 2 30 FEDERATED DEPARTMENT STORES INC FEDERATED DEPT STORES INC DEL NA NA
2 31410H101 794367 2 30 FEDERATED DEPARTMENT STORES INC /DE/ FEDERATED DEPT STORES INC DEL NA NA
> ciks %>% filter(cik == 34945)
# A tibble: 1 x 2
cik company_name
<int> <chr>
1 34945 FEDERATED DEPARTMENT STORES INC
> ciks %>% filter(cik == 794367)
# A tibble: 2 x 2
cik company_name
<int> <chr>
1 794367 FEDERATED DEPARTMENT STORES INC /DE/
2 794367 MACY'S, INC.
@bdcallen
Two issues:
reprex
). This is OK here and there, but if you can switch to the other form, it would make these code snippets much more useful.Anyway, my synthesis is that there are (at least) two cases here:
I think we may want to handle Case 1 first. I think the simplest approach would be to simply make a table that records valid CUSIP-CIK matches for these CUSIPs. I assume that there aren't too many of these, so even manually looking up CIKs should be quite feasible. I have created a Google Sheets document here for you to enter valid CIKs. (Note that this issue is definitely not one to be addressed in parsing. The Python code is correctly parsing these filings, but they contain bad information.) Google Sheets are easy to maintain and it's easy to incorporate the data from them.
If this approach to Case 1 is feasible, then we might not have to worry about Case 2, as these are valid matches.
Based on my analysis from 14 May, I think neither issue is one of parsing (second one is not even) an issue. So I'm closing this one with "no action" for the code.
@bdcallen This builds on one aspect of #89. My question is: Are any of the matches in
problems
valid? (Note that I ran this code on a local copy of RStudio usingreprex::reprex()
so that I can paste code that can be run. I also ran it on the Z640 machine by first copying from here, then runningreprex::reprex_clean
to clean it up, then pasting in an RStudio Server source window, highlighting, then selectingReprex selection
from the Addins menu in RStudio Server.)Created on 2020-05-13 by the reprex package (v0.3.0)