Resolve loss of data with daily data

iangow commented 1 year ago

We were not sure whether the counter-argument on dividends and splits are due to the number of observations we have is small. We were unsure whether we did something wrong that led to the sample being so small, the ones we did in the tutorial had 239 FALSE observations and 693 TRUE observations while we only have 17 and 10 respectively. Would it be because we used announcement dates rather than effective dates?

We just wanted to update you before you start reviewing our code, we tried a few different things and it turns out when we used monthly data on announcement dates with more recent data, we were able to get a lot more data points for the dividend followed by splits analysis.

You should have all the splits in either place (monthly or daily data). I think it would be better to use daily data on returns. Perhaps you need to compare just the splits you get from daily data with those you get from monthly data. Once you have worked out the best way to construct a data set of splits then work out how to merge with daily returns.

You may want to construct code focus on just this issue. Once you have sorted it out, you could use in your main analysis.

iangow commented 1 year ago

@vickyyin1493 @shizhe75 See comment above.

vickyyin1493 commented 1 year ago

You should have all the splits in either place (monthly or daily data). I think it would be better to use daily data on returns. Perhaps you need to compare just the splits you get from daily data with those you get from monthly data. Once you have worked out the best way to construct a data set of splits then work out how to merge with daily returns.

We tried to left_join day_indexes which resulted in no loss of splits data (attached screenshot). However, the day_index is missing which resulted in the ex_day_index being missing.

We tried to use full_join but it still gives NA day_index.

We suspect that it's because we used msf and msi which resulted this day_index being missing. We proceed to use dsf and dsi in our analysis. It worked for the main replication only if we remove the filter for n_obs.

However, there are still missing splits. (when used inner_join)

When we used left_join, there will be no missing splits, however, we will not be able to calculate abnormal returns.

Additionally, in the dividends section, when we used daily data, there are not many matched observations, we suspect that it might be because it is pretty rare for the dividends to be announced on the same day as the splits. We wonder if it's fine to keep using monthly for our dividends analysis which is similar to the original replication in the textbook. @iangow

iangow commented 1 year ago

@vickyyin1493 @shizhe75 You seem to be mixing up all kinds of issues here.

a. You shouldn't need to use screenshots. Screenshots mean I can't copy your code. b. as.Date(date_trunc('day', date)) seems unnecessary; date should work. c. What are you trying to do with date_index? d. "it is pretty rare for the dividends to be announced on the same day as the splits." Why would this be a requirement? e. "We wonder if it's fine to keep using monthly for our dividends analysis which is similar to the original replication in the textbook." You really want to use daily data here. I think it would be better to figure out the issue in your data steps. FFJR starts with the population of splits then merges with monthly data and then does event returns around effective dates. You want to start with the population of splits then merge with daily data and then do event returns around announcement dates. The "population of splits" part should not be affected by the subsequent steps.

iangow commented 1 year ago

@vickyyin1493 @shizhe75 Have you had a chance to look at this? Would it make sense to chat some time this week?

vickyyin1493 commented 1 year ago

a. We will attach the code in the main repository named "left_join_map()_not_work.qmd" and "inner_join compromised splits.qmd" respectively. https://github.com/vickyyin1493/Replication-FFJR/blob/main/inner_join%20compromised%20splits.qmd https://github.com/vickyyin1493/Replication-FFJR/blob/main/left_join_map()_not_work.qmd b. Thank you, we changed that. c. We were trying to do something similar with month_index but using daily data, so we named it day_index. Our understanding is that we are trying to match returns with splits in a certain window using day_rel_ex (365 days before + 365 days after = 731 days). d. Thank you. It shouldn't be. Our current interpretation is that we are trying to differentiate 365 days before and after the splits and calculate the difference (div ratio) in dividend payout. e. There are a few problems when we adjusted our codes:

we would be able to get all the splits by using left_join (for nyse_splits) and dsi, dsf data. However, we had to remove the filter for n_obs for the split_sample code to run. Additionally, if we used left_join, map() function used in calculating abnormal_returns wouldn't work. We then try to test our other codes using inner_join first.
When we used inner_join (for nyse_splits), we will result in 12 loss of splits, additionally, some subsequent codes do not work.

f. It'll be great if we could have a chat this week, anytime (with exceptions listed below) between AEST 8 am-11 pm would work for us. Exceptions: Thursday: 2 PM - 5 PM Friday: 2 - 2:30 PM and 4-7 PM; Ideally not 11 AM - 12:30 PM, but if no other time suits you, we could do this time slot as well.

@iangow

iangow commented 1 year ago

What is the filter() with n_obs needed for? I think filtering for a fixed number of days is always going to be more problematic than doing the same with months. If you are trying to have a (somewhat) balanced panel, then it might make sense to allow for a range of values for n_obs.

Anything you can do to break your code into smaller components will make it easier to locate the source of problems.

"Additionally, if we used left_join, map() function used in calculating abnormal returns wouldn't work." Does it work with inner_join()? Do you get an error message?

Can you do Friday before 11am?

vickyyin1493 commented 1 year ago

It works with inner_join but we'll lose some splits (nyse_splits 2293 splits compared to nyse_splits_raw 2305 splits). We would not get error messages in the replication for figure 2b, but will not be able to replicate table 3 and data on dividends. EDIT: We can now replicate the figures for dividend increase and decrease, but would not be able to count (with reasonable data points) whether splits have been followed by dividend increases. EDIT: We now have data points after removing the filter for day_gap, but we were not sure if that's appropriate. Updated link for qmd file.

We put all codes we think are not causing problems into a chunk and other ones in separate chunks here: https://github.com/vickyyin1493/Replication-FFJR/blob/main/Identify_problems_updated.qmd

Friday before 11 would work for us. Would 10 work for you? We can send an invite with a zoom link over if that works for you. @iangow @shizhe75

shizhe75 commented 1 year ago

Can confirm in that code link that we can successfully obtain plots for all parts of the replication for now. We can now obtain figures for the dividend daily replication section and have more data points for dividends but it is definitely not as many compared to the original monthly and effective date dividend replication - we were wondering if this still had to do with the filtering mentioned above or the issue still resides with using a wrong inner-join which is leading us to lose a couple of splits.

@iangow @vickyyin1493

iangow commented 1 year ago

The above issue may be solved with the other issues. Check those first then confirm that there's nothing left here to address.

vickyyin1493 commented 1 year ago

Thank you @iangow, the above issue is resolved now.

vickyyin1493 / Replication-FFJR

Resolve loss of data with daily data #5