r-quantities / quantities

Quantity Calculus for R Vectors
https://r-quantities.github.io/quantities
Other
26 stars 3 forks source link

Wrapping lm() function #10

Open jamarav opened 4 years ago

jamarav commented 4 years ago

Hi everybody,

I open this issue in order to review any problems that I observed with lm function and the package quantiles. I have reviewed the documentation. Specifically at the following link (https://www.r-spatial.org/r/2018/08/31/quantities-final.html#fitting-linear-models-with-quantities), it is very well specified how to include specific methods to use lm with the package quantities. First of all, note that this is the first time I have tried to include experimental error in linear regression models. The first question that arises is whether to consider a regression weighing with the uncertainty of the response variable. It is a correct approach? I have tried to include the weight parameter to the qlm function and it is not able to get the model. The error obtained is the following:

Model_Wc<-qlm(formula = Wcomp ~  Pevap + Pcond + Pevap:Pcond, weights = rep(1, 91),  data = df)
Error in eval(extras, data, env) : 
  ..1 usado en un contexto incorreto, ningún ... para examinar

I have reviewed the function definition and this supports additional parameters.

The other problem that I have noticed is the presence of a warning in the coef.qlm function. When the formula to be applied in the function is defined with interaction variables of type x1: x2 (the special nomenclature of th lm function), the following warning is obtained:

> df<-l_scroll$AHRI_66$HPR2A$SH_11
> Model_Wc<-qlm(formula = Wcomp ~  Pevap + Pcond + Pevap:Pcond,  data = df)
> coef(Model_Wc)
$`(Intercept)`
0.5(2) [kW]

$Pevap
0.06(2) [kW/bar]

$Pcond
0.34(1) [kW/bar]

$`Pevap:Pcond`
9(10)e-4 [kW]

Warning message:
In mapply(set_quantities, NextMethod(), coef.units, sqrt(diag(vcov(object))),  :
  el argumento más largo no es múltiplo de la longitud del más corto

On the other hand, If I applied the following definition for the formula parameter the warning disappears:

> Model_Wc<-qlm(formula = Wcomp ~  Pevap + Pcond + I(Pevap*Pcond),  data = df)
> coef(Model_Wc)
$`(Intercept)`
0.5(2) [kW]

$Pevap
0.06(2) [kW/bar]

$Pcond
0.34(1) [kW/bar]

$`I(Pevap * Pcond)`
9(10)e-4 [kW/bar^2]

The results seem corrects but a warning is present in the first method.

An other question is about the summary obtained for this qlm objects. If I print the summary the Std. Error of the coefficients is not the same as te previous one (coef(Model_Wc)).

Other problem is how to define the uncertainty for the variables involved in the regression model? Is necessary to define as standard uncertainty or is it possible to define as an expanded uncertainty and to obtaint the correct error for the coefficients when I apply coef function?

Finally, the last question is about the predict function for qlm objects. In the linked reported, it defines specific methods for qlm objects too. I have reviewed the function and it only applies the set_quantities function and it activates the native parameter se.fit. Is the obtained error correct? The se.fit parameter only activates the native confidenze intervals for the lm function. Are not necessary to take into account the experimental uncertainties for the calculation of the response variable error?

This problem is discussed in https://stats.stackexchange.com/questions/235693/linear-model-where-the-data-has-uncertainty-using-r but the final solution involves using the root python packe.

Sorry for the many doubts and thanks in advance

Enchufa2 commented 4 years ago

About the errors you see, these functions were a quick proof of concept tailored to... what you see in that post, basically, and weren't tested with other arguments or other ways to specify the formula. There is a lot to experiment to make those wrappers reliable enough, and that's why they are not part of the package already. Also, wrapping lm is not easy, because there's a lot of playing with the parent frame inside, and that's probably why you get an error with the weights. We would very much like to enhance the package with these tools, but currently lack the time to do it, so any contribution would be appreciated. :)

Other problem is how to define the uncertainty for the variables involved in the regression model? Is necessary to define as standard uncertainty or is it possible to define as an expanded uncertainty and to obtaint the correct error for the coefficients when I apply coef function?

Not sure if I understand correctly. The errors package currently supports standard uncertainties and Taylor-based propagation, so there's no way of defining an "expanded uncertainty", whatever it means.

Finally, the last question is about the predict function for qlm objects. In the linked reported, it defines specific methods for qlm objects too. I have reviewed the function and it only applies the set_quantities function and it activates the native parameter se.fit. Is the obtained error correct? The se.fit parameter only activates the native confidenze intervals for the lm function. Are not necessary to take into account the experimental uncertainties for the calculation of the response variable error?

This problem is discussed in https://stats.stackexchange.com/questions/235693/linear-model-where-the-data-has-uncertainty-using-r but the final solution involves using the root python packe.

As discussed in that link, the loss function for a linear regression doesn't take into account the uncertainty of individual y values. There is a prediction interval and a mean prediction interval. Both of them involve the variance in the x-axis, but not in the y-axis. If you want to consider the latter, then you are looking for another kind of model.

jamarav commented 4 years ago

I just modified the qlm function by adding an extra parameter for weights. In this way, it is possible to carry out a weighted regression. I don't understand why it should be included since the argument ... would include any additional parameters. Regarding the expanded uncertainty, it is simply the uncertainty multiplied by the tstudent factor. I have been able to verify that using the quantities package we can operate with both uncertainties and expanded uncertainty (including Taylor error propagation as long as the tstudent factor is the same). But I'm not sure if this assumption is correcto for the error coefficients in the regression model. I don't have a high knowlege about object-oriented programming as shown in the report, but if it is possible I'll try to replicate a similar wrapper for models in the nls.multstar package. It is a function similar to nls models with the advantage of being able to include ranges of values ​​for the coefficients.

Enchufa2 commented 4 years ago

I don't understand why it should be included since the argument ... would include any additional parameters.

It's because lm evaluates stuff in the parent frame.

Regarding the expanded uncertainty, it is simply the uncertainty multiplied by the tstudent factor.

Ok, you mean the confidence interval. I don't think that the CI should be used as the uncertainty in the result. It has a different meaning than the standard error, and depends on the significance level you choose. The standard error does not depend on that.

However, there are two confidence intervals: the mean response interval and the prediction interval. And now that you mention it, I believe that the se.fit corresponds to the standard error of the mean response, which is obviously smaller than the standard error associated to the prediction interval. So I think that the predict.qlm function should set interval="prediction" instead and then scale down the CIs to get the standard errors.

I don't have a high knowlege about object-oriented programming as shown in the report, but if it is possible I'll try to replicate a similar wrapper for models in the nls.multstar package. It is a function similar to nls models with the advantage of being able to include ranges of values ​​for the coefficients.

I have this tutorial that may be of interest.