Add data parameter to shorten formulas

mpiktas / midasr

R package for mixed frequency time series data analysis.

http://mpiktas.github.io/midasr/

Other

73 stars 34 forks source link

Add data parameter to shorten formulas #38

Closed MaximilianJHuber closed 10 years ago

MaximilianJHuber commented 10 years ago

In lm you can enter y ~ x + b with all variable being listed in a data.frame which you pass on to lm via the data parameter. I would like to prepare a list of "mls" matrices and do the same as in lm.

E.g.: eq <- midas_r( midasdata$rGDPg2 ~ midasdata$MonMulti, start=list(MonMulti=c(1,-0.1)) )

where midasdata contains mls matrices produced with: midasdata[[i]] <- fmls(MonMulti,ratio-1, ratio, nealmon)

eq <- midas_r( rGDPg2 ~ MonMulti, start=list(MonMulti=c(1,-0.1)), data=midasdata )

vzemlys commented 10 years ago

Current behaviour is exactly like lm. In the data argument you pass the data where named columns correspond to variables, and the construction of the model is governed by formula interface. So for your example the specification would be

 midas_r(rGDPg2~fmls(MonMulti,ratio-1, ratio, nealmon),data=midasdata,start=list(MonMulti=c(1,-0.1)))

where

  midasdata <- list(rGDPg2=y,MonMulti=x)

If rGDPg2 and MonMulti are in Global environment, then there is no need to specify the data argument, the behaviour which is consistent with lm.

I do not see the benefits of your proposal and I immediately see several problems related to the implementation. For example how would I know that nealmon restriction should be used? Maybe I misunderstand something, could you elaborate more on your proposal, or specifically what problems do you see with the current implementation?

MaximilianJHuber commented 10 years ago

My point is that the formula gets excessively large when there are more variables, so I think it would increase readability and formula flexibility (i.e. if a formula is constructed on-the-fly) to prepare a list of mls matrices first. Then the formula's job is only to specify my model, not to prepare data.

vzemlys commented 10 years ago

There is still a question of how the restriction would be defined in such case. All of the potential restrictions use the same mls matrix. It would be possible to implement this using the attributes, but I do not see how this would improve the clarity. Yes the formula would be smaller, but

a user would not see what restriction is used in the model definition
if a user wants to change the restriction, then he/she will need to update the matrix, instead of the formula
user would be required to track the alignment of the matrices, i.e. that their dimensions match. This includes caring about NA values, etc. Currently this is automatically handled by the formula interface, you only need to make sure that your mixed frequency data is aligned, which is much more natural to do for the original data, not for its transformations.
formula interface allows to specify transformations of the variables. For example taking logarithm of the variable. To make it happen with your proposed format, I would need to reimplement the way R formula works.

MaximilianJHuber commented 10 years ago

Ok, your last point is the most striking one. The mls is just a transformation of data as the log, etc. is and belongs in the formula. Thank you for your time, you are doing a great job with midasr!

vzemlys commented 10 years ago

You are welcome.

vzemlys commented 10 years ago

Now you can also use function midas_r_simple. This function does not use formula interface and expects matrices prepared with mls. This function is currently available in development version of midasr, which you can install using install_("midasr","mpiktas","np"). See the example in ?midas_r_simple.