shokru / mlfactor.github.io

Website dedicated to a book on machine learning for factor investing
202 stars 96 forks source link

Chp03.Rmd line 124: error with the return lag #58

Closed ronnieqt closed 3 years ago

ronnieqt commented 3 years ago
data_FM <- left_join(data_ml %>%                                    # Join the 2 datasets
                         dplyr::select(date, stock_id, R1M_Usd) %>% # (with returns...
                         filter(stock_id %in% stock_ids_short),     # ... over some stocks)
                     FF_factors, 
                     by = "date") %>% 
    mutate(R1M_Usd = lag(R1M_Usd)) %>%                              # Lag returns
    na.omit() %>%                                                   # Remove missing points
    spread(key = stock_id, value = R1M_Usd)

In the above code block (Chapter 03.Rmd), we should groupby stock_id first before lagging the returns.

In the current code setup, stock 1's return in 2018-12 will be shifted into stock 3's return in 2000-01 (which can be observed in the data_FM dataframe).

ronnieqt commented 3 years ago

Betas look somewhat different.

Regression result with groupby(stock_id): betas_w_groupby

Regression result without groupby(stock_id): betas_wo_groupby

ronnieqt commented 3 years ago

Another question related to this is: why we lag the return?

df

In the above screenshot, -0.036 is stock 1's return in Jan. 2000. If we lag the return, then, when running regressions, we are essentially using Feb. 2000's factors to predict Jan. 2000's return.

Shouldn't we build a model that can use factors available at t to predict stock returns for the next month (t+1)?

Thanks!

shokru commented 3 years ago

You are correct for the first remark. Indeed the data should be group before lagging. Luckily, it only affects a small portion of returns, which indeed shifts betas, but only marginally. I will correct this in the next version, which I will release asap (maybe this weekend).

For your last point, well it's an open question in fact. It depends what you want to do. In the original 1973 paper, the regressions are not predictive (returns & loadings are synchronous), so the purpose is to explain. But indeed, you could very much use the forward returns, in which case you would predict. Fama-Macbeth is used to compute so-called "market prices of risk" (or risk premia) associated to factors. Personally, I don't use it as forecasting tool...

Thanks for the correction!

ronnieqt commented 3 years ago

For your last point, well it's an open question in fact. It depends what you want to do. In the original 1973 paper, the regressions are not predictive (returns & loadings are synchronous), so the purpose is to explain. But indeed, you could very much use the forward returns, in which case you would predict. Fama-Macbeth is used to compute so-called "market prices of risk" (or risk premia) associated to factors. Personally, I don't use it as forecasting tool...

Got it. Thank you very much for the quick reply.

So, just to clarify, the reason we lag the return is because: we want to use t+1's factors to explain t's returns.

Do I understand the interpretation correctly?

Thanks!

shokru commented 3 years ago

In this case, since the R1M_Usd are the 1 month future returns, lagging shifts them in the past, so they become synchronous (and no longer predictive). We are implementing the original version of the estimation. But many variations have blossomed since them, and there is no one dominant paradigm (I think).

ronnieqt commented 3 years ago

In this case, since the R1M_Usd are the 1 month future returns, lagging shifts them in the past, so they become synchronous (and no longer predictive). We are implementing the original version of the estimation. But many variations have blossomed since them, and there is no one dominant paradigm (I think).

Got it. Thank you and thanks for the great book.