Question of choosing lag numbers in mls function

shelmonlu commented 5 years ago

Hello, I am trying to replicate the Ghysels, Santa, and Valkanov (2005) results using your midasr package and I am wondering if you could give me some suggestion or advice.

I noticed that in the paper the authors use 252 days as the maximum lag length. But I fail to use this number in the mls function because there would be all of NAs in the mls matrix. I use the following code but get wired results:

eq <- midas_r(month ~ mls(day, 0:21, 22, nealmon), start = list(day= c(1,-0.006, -0.0002)))

Could you please tell me if I was using the wrong code to replicate the results? Thank you so much for your time!

vzemlys commented 5 years ago

Yes the code is correct, i.e. it should run. Note using 252 days it is not a problem in midasr, the problem is whether the data supports that. Are you replicating with the data in the article?

shelmonlu commented 5 years ago

Thank you for your reply! As indicated in the article, I collect the data from both CRSP and Schwert’s website (I can roughly replicate table 1). After I rehandling the data, I have 876 monthly observations and 19272 daily observations ( I keep 22 daily observations per month, if a month has less than 22 observations, I use NA at the beginning of that month). Then I get the following result:

eqr<-midas_r(month ~ mls(day, 0:251, 22, nealmon), start = list(day = c(1, -0.006, -0.0002)))

MIDAS regression model model: month ~ mls(day, 0:251, 22, nealmon) (Intercept) day1 day2 day3 -0.0004250 0.9999963 -0.0059980 -0.0007948

My understanding is that day1 represents the coefficient of mls function (or the gamma), day2 and day3 represent k1 and k2 in the weights function. Please correct me if I misunderstand the results. My day1 coefficient is very close to 1, which is the number I provided. However, I should get something close to 2.6. I am wondering if this problem is due to my data or due to the initial value I provided?

And when I check the mls(day, 0:251, 22, nealmon) matrix, the first 11 lines are NAs. I am also wondering if this is acceptable?

vzemlys commented 5 years ago

Can you post the screenshot of the model in the article? Your understanding is correct, so the reason for the result which you do not like is the alignment of the data. Look into eqr$model to see that months and days are aligned properly. NA in the first lines are perfectly normal, since you always lose data at the beginning of the sample due to lags.

If you are sure that the data is aligned properly and you are using the same sample, then you can always check the value of NLS objective function with the parameters in the article. The value of objective function for fitted coefficients can be calculated with the following code:

eqr$fn0(coef(eqr))

Supply parameters of article (the order is important!) and then compare the values. If yours is the smaller one, then it means your values are correct and the optimisation method used in the article did not converge properly. If yours is larger then you need to try different optimisation methods. The JSS article contains examples of how to do that. Note this step can only be done when you ensure that the nealmon specification and the data is the same as used in article.

shelmonlu commented 5 years ago

In the paper, they use:

They multiply the weighted average of r^2 with 22 to aggregate the volatility into the monthly level. The question is about 22 in the model. I am wondering if I should also time 22 to my daily observations before I run the regression model, or I should incorporate the 22 into one of the initial parameters provided?

vzemlys commented 5 years ago

Multiply r by 22, this is purely normalizing factor. Do not model it as a parameter.

Sent from my iPhone

On 23 Mar 2019, at 17:54, Shelmon notifications@github.com wrote:

In the paper, they use:

They multiply the weighted average of r^2 with 22 to aggregate the volatility into the monthly level. The question is about 22 in the model. I am wondering if I should also time 22 to my daily observations before I run the regression model, or I should incorporate the 22 into one of the initial parameters provided?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

shelmonlu commented 5 years ago

Thank you, Dr. Zemlys! I really appreciate your help.

mpiktas / midasr

Question of choosing lag numbers in mls function #69