st-- / LinearRegression.jl

Simple & fast linear regression in Julia
MIT License
22 stars 0 forks source link

Predicting: Usage of Package in Readme still up to date? #17

Open TheFibonacciEffect opened 2 weeks ago

TheFibonacciEffect commented 2 weeks ago

Hi, I would like to predict the model on a dataset. Here is what I tried, I thought it was supposed to work according to the readme:

x = rand(10)
y = 3*x .+1
ln = linregress(x,y)
LinearRegression.LinearRegressor{Vector{Float64}}(true, [2.9999999999999996, 0.9999999999999991])

Now I try to predict the model on a set of points:

ln(1:0.1:10)
ERROR: DimensionMismatch: first array has length 91 which does not match the length of the second, 1.
Stacktrace:
 [1] dot(x::StepRangeLen{…}, y::SubArray{…})
   @ LinearAlgebra ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/generic.jl:889
 [2] *
   @ ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/adjtrans.jl:478 [inlined]
 [3] (::LinearRegression.LinearRegressor{…})(x::StepRangeLen{…})
   @ LinearRegression ~/.julia/packages/LinearRegression/hggkt/src/linreg.jl:56
 [4] top-level scope
   @ REPL[44]:1
Some type information was truncated. Use `show(err)` to see complete types.

Is this how it is supposed to be done according to the readme or did I misunderstand it?

st-- commented 2 weeks ago

Hi, I have to admit I might not have thought through well enough how to use it in the scalar case (1-dimensional input) ... The code generally assumes that it's higher-dimensional input, so a matrix is interpreted as multiple covariates for multiple input points, whereas a vector is interpreted as multiple covariates for a single input point, not as a single covariate for multiple input points. To predict for multiple points, you need to pass in a matrix; in your case, an $N \times 1$ matrix. E.g. as follows: ln(reshape(1:0.1:10, :, 1))

TheFibonacciEffect commented 2 weeks ago

Okay, thanks a lot. Do you think it would make sense to support this use case? It would only be a line of code overloading the function for vectors and I think its quite common to do linear regressions on 1D data.

st-- commented 1 week ago

The challenge is how to distinguish between "vector = a single data point with multiple covariates" and "vector = multiple data points with a single covariate each"... (e.g. KernelFunctions.jl exports ColVecs and RowVecs, but that seems too heavy a dependency for this package.) How would you suggest to support both ?