Closed PaulSoderlind closed 3 years ago
Hi @PaulSoderlind ,
I am not against defining formulas in this way because the terms in linear regressions are always summed up in the same form. My primary motivation was to use a known method for those came from R or Julia users who already use GLM.
Thank to multiple dispatch, new methods with same name would be implemented, for example
bch(setting::RegressionSetting; alpha=0.05, maxiter=1000, epsilon=0.000001)
method uses the standart @formula type definition. Another bch
would be
bch(data::Tuple{Matrix, Array{Float64, 1}); alpha=0.05, maxiter=1000, epsilon=0.000001)
or something similar.
It is important to have first method because despite being linear, some models have more complex design matrices which are difficult to construct by hand. Including an intercept or not, dummy variables, changing intercept or slope or both by dummies etc. separates the concepts of 'summation of independent variables' and 'the design matrix'.
Are you agreed with this? If yes, we can discuss implementation details later.
Yes, this sounds good. Thanks
okay @PaulSoderlind , it would be good to see you as a contributor. It should not be complete, you can make pull requests than include partial changes when you have time. Thank you in advance. Welcome :)
Hi, I would be happy to contribute, but it will take some due to my teaching.
Still, that does not have to stop us from thinking a bit about how to do it. To my mind, the best would be to use dispatch in such a way that there is no need to duplicate code. To illustrate what I mean, consider this refactoring of lad.jl:
function lad(data::Tuple{Vector,Matrix); starting_betas=nothing)
(y,X) = data
...all the current lad code
return result
end
function lad(setting::RegressionSetting; starting_betas=nothing)
X = designMatrix(setting)
y = responseVector(setting)
result = lad((y,X), starting_betas=starting_betas)
return result
end
This would be convenient since the 2nd version (with setting...) can easily call on the first version (with (y,X)). The other way around looks more complicated, but maybe you know how to do it
yes, but some other algorithms uses the RegressionSetting object in more than one place and it will be more complicated.
Converters from data::Tuple{Vector,Matrix)
to RegressionSetting
and vice versa should help.
what do you think about
convert(RegressionSetting, (y, X))
?
yes, but some other algorithms uses the RegressionSetting object in more than one place and it will be more complicated. Converters from data::Tuple{Vector,Matrix) to RegressionSetting and vice versa should help.
So, does RegressionSetting
contain any information that cannot be extracted from (y,X)
?
No, RegressionSetting includes a formula and a dataset and theoretically a design matrix X and a response vector y perfectly define a linear model.
(X, y)
class multiple dispatch is implemented for all algorithms except ransac()
.
Dear @tantei3, please read the implementations of other algorithms carefully and implement the method
ransac(X::Array{Float64, 2), y::Array{Float64, 1}, ...)
as in the hs93
, py95
, or ks89
.
A new data structure OLS
is introduced in /src/ols.jl
with helper methods residuals()
, predict()
, coef()
, etc().
The ols()
and wls()
methods are for linear regression and weighted linear regression, respectively and by these implementation we will no need for lm()
in package GLM. After adaptation of ransac
I will rearrange requirements in LinRegOutliers.jl
. Fyi.
All of the methods have (X, y) type dispatch now and I am closing this issue. @PaulSoderlind your other contributions are always welcome, thank you for this feature request.
thanks
Hi,
following up the discussion on Discourse, I would kindly ask for method for
(y,X)
.Motivation: while GLM and friends are often useful, it is sometimes easier to just do 'b = X\y' etc.
Feasibility: looking at your code, it sometimes (like in lad.jl) starts with
In these cases, it should be straightforward to add methods. (I am busy with teaching right now, but might able submit PRs later this autumn.)