iangow / streetevents_private

2 stars 1 forks source link

Examine cases where there are two earnings calls close together for given CUSIP #30

Closed iangow closed 6 years ago

iangow commented 7 years ago

Clearly, we have some bad matches here:

Sys.setenv(PGHOST="aaz.chicagobooth.edu", PGDATABASE="postgres")
library(dplyr, warn.conflicts = FALSE)
library(RPostgreSQL)
pg <- dbConnect(PostgreSQL())

calls <- tbl(pg, sql("SELECT * FROM streetevents.calls"))
crsp_link <- tbl(pg, sql("SELECT * FROM streetevents.crsp_link"))
stocknames <- tbl(pg, sql("SELECT * FROM crsp.stocknames"))

call_dates <- 
    calls %>% 
    filter(event_type==1L, cusip !='') %>%
    select(file_name, cusip, start_date) %>%
    mutate(start_date = sql("start_date::date")) %>%
    compute()

suspicious_calls <-
    call_dates %>%
    inner_join(call_dates, by="cusip", suffix=c("_x", "_y")) %>%
    filter(file_name_x != file_name_y,
           between(start_date_x, start_date_y, sql("start_date_y + interval '4 weeks'"))) %>%
    compute()

suspicious_calls %>%
    count()

call_dates_link <- 
    call_dates %>% 
    inner_join(crsp_link)
    filter(event_type==1L, cusip !='') %>%
    select(file_name, cusip, start_date) %>%
    mutate(start_date = sql("start_date::date")) %>%
    compute()

suspicious_calls_alt <-
    call_dates_link %>%
    inner_join(call_dates_link, by="permno", suffix=c("_x", "_y")) %>%
    filter(file_name_x != file_name_y,
           between(start_date_x, start_date_y, sql("start_date_y + interval '4 weeks'"))) %>%
    compute()

suspicious_calls_alt %>% 
    select(matches("file_name.*")) %>% 
    inner_join(
        calls %>% 
            select(file_name, event_title) %>% 
            rename(event_title_x = event_title,
                   file_name_x=file_name)) %>%
    inner_join(
        calls %>% 
            select(file_name, event_title) %>% 
            rename(event_title_y = event_title,
                   file_name_y = file_name))
iangow commented 7 years ago

Here is some output demonstrating that there are issues here:

> suspicious_calls_alt %>% 
+     select(matches("file_name.*")) %>% 
+     inner_join(
+         calls %>% 
+             select(file_name, event_title) %>% 
+             rename(event_title_x = event_title,
+                    file_name_x=file_name)) %>%
+     inner_join(
+         calls %>% 
+             select(file_name, event_title) %>% 
+             rename(event_title_y = event_title,
+                    file_name_y = file_name))
Joining, by = "file_name_x"
Joining, by = "file_name_y"
# Source:   lazy query [?? x 4]
# Database: postgres 9.4.12 [igow@aaz.chicagobooth.edu:5432/postgres]
   file_name_x file_name_y                                                 event_title_x                                                       event_title_y
         <chr>       <chr>                                                         <chr>                                                               <chr>
 1   1000393_T    997230_T          Q4 2004 Komag, Incorporated Earnings Conference Call              Q2 2005 Western Digital Corp. Earnings Conference Call
 2   1000707_T    986570_T Q4 2004 ImClone Systems Incorporated Earnings Conference Call                          Q4 2004 Eli Lilly Earnings Conference Call
 3   1001059_T   1011391_T                       Q4 2004 Ventas Earnings Conference Call Q4 2004 Nationwide Health Properties, Inc. Earnings Conference Call
 4   1002358_T    998267_T        Q4 2004 Caesars Entertainment Earnings Conference Call             Q4 2004 Harrah's Entertainment Earnings Conference Call
 5   1003474_T    989056_T           Q3 2005 INTERMIX MEDIA INC Earnings Conference Call           Q4 2004 Dow Jones & Company Inc. Earnings Conference Call
 6   1003474_T   1005214_T           Q3 2005 INTERMIX MEDIA INC Earnings Conference Call            Q2 2005 Fox Entertainment Group Earnings Conference Call
 7   1003474_T   1005213_T           Q3 2005 INTERMIX MEDIA INC Earnings Conference Call                   Q2 2005 News Corporation Earnings Conference Call
 8   1003477_T   1008415_T                    Q4 2004 Raindance Earnings Conference Call                            Q4 2004 Intrado Earnings Conference Call
 9   1003477_T   1000173_T                    Q4 2004 Raindance Earnings Conference Call                   Q4 2004 West Corporation Earnings Conference Call
10   1005213_T    989056_T             Q2 2005 News Corporation Earnings Conference Call           Q4 2004 Dow Jones & Company Inc. Earnings Conference Call
# ... with more rows
iangow commented 6 years ago

Not sure if this has been subsumed by other issues. But perhaps worth looking at more closely.

jingyujzhang commented 6 years ago

This problem has been addressed. Right now, the output only captures those files belong to the same company, same conference call, like the QA session and presentation session. This is what I'm working on now -- firms with more than 5 calls in one fiscal year.

> suspicious_calls_alt %>% 
+     select(matches("file_name.*")) %>% 
+     inner_join(
+         calls %>% 
+             select(file_name, event_title) %>% 
+             rename(event_title_x = event_title,
+                    file_name_x=file_name)) %>%
+     inner_join(
+         calls %>% 
+             select(file_name, event_title) %>% 
+             rename(event_title_y = event_title,
+                    file_name_y = file_name)) %>% 
+     distinct() %>% 
+     count()
Joining, by = "file_name_x"
Joining, by = "file_name_y"
# Source:   lazy query [?? x 1]
# Database: postgres 9.4.12 [jingyuzhang@aaz.chicagobooth.edu:5432/postgres]
      n
  <dbl>
1   574

Examples of these cases:
# Source:   lazy query [?? x 4]
# Database: postgres 9.4.12 [jingyuzhang@aaz.chicagobooth.edu:5432/postgres]
   file_name_x file_name_y event_title_x                                                  event_title_y                                                       
   <chr>       <chr>       <chr>                                                          <chr>                                                               
 1 1696448_T   1676913_T   Q4 2007 Toll Brothers Earnings Conference Call                 Toll Brothers Fourth Quarter 2007 Outlook Conference Call           
 2 5975718_T   5974315_T   Q1 2016 Fortinet Inc Earnings Call Q&A Session                 Q1 2016 Fortinet Inc Earnings Call                                  
 3 5325592_T   5279415_T   Q4 2013 SolarCity Earnings Conference Call                     Q4 2013 SolarCity Earnings Conference Call                          
 4 3037464_T   3037418_T   Ford Motor Company First Quarter Fixed Income Conference Call  Q1 2010 Ford Motor Company Earnings Conference Call                 
 5 1907838_T   1898325_T   Preferred Bank Rescheduled Earnings Conference Call            Q2 2008 Preferred Bank Earnings Conference Call                     
 6 802534_T    801731_T    Q3 2003 Willbros Group, Inc. Earnings Conference Call          Preliminary Q3 2003 Willbros Group, Inc. Earnings Conference Call   
 7 4216924_T   4216923_T   Q4 2011 Franklin Resources Inc Earnings Q&A Conference Call    Q4 2011 Franklin Resources Inc Pre Recorded Earnings Conference Call
 8 3228344_T   3228332_T   Ford Motor Company Second Quarter Fixed Income Conference Call Q2 2010 Ford Motor Company Earnings Conference Call                 
 9 2493910_T   2493909_T   Q4 2009 Franklin Resources Earnings Q&A Conference Call        Q4 2009 Franklin Resources Pre Recorded Earnings Conference Call    
10 670839_T    665957_T    Q2 2003 Tidewater Earnings Conference Call Q&A Portion Only    Q2 2003 Tidewater Earnings Conference Call with Q&A        
iangow commented 6 years ago

Look at #40. This should handle any issues with more than 5 calls in a year. Many cases are due to bad PERMNO matches.