sylvaticus / BetaML.jl

Beta Machine Learning Toolkit
MIT License
92 stars 14 forks source link

MLJ model `BetaMLGMMRegressor` predicting row vectors instead of column vectors #40

Closed ablaom closed 1 year ago

ablaom commented 2 years ago
using MLJBase
using MLJModels
model = (@iload BetaMLGMMRegressor)()
X, y = make_regression();
mach = machine(model, X, y) |> fit!
yhat = predict(mach, X);

julia> l2(yhat, y)
ERROR: DimensionMismatch: Encountered two objects with sizes (100, 1) and (100,) which needed to match but don't. 
Stacktrace:
 [1] check_dimensions
   @ ~/.julia/packages/MLJBase/CtxrQ/src/utilities.jl:145 [inlined]
 [2] _check(measure::LPLoss{Int64}, yhat::Matrix{Float64}, y::Vector{Float64})
   @ MLJBase ~/.julia/packages/MLJBase/CtxrQ/src/measures/measures.jl:60
 [3] (::LPLoss{Int64})(::Matrix{Float64}, ::Vararg{Any})
   @ MLJBase ~/.julia/packages/MLJBase/CtxrQ/src/measures/measures.jl:126
 [4] top-level scope
   @ REPL[36]:1
sylvaticus commented 2 years ago

Thanks, going check it...

sylvaticus commented 2 years ago

Hello, BetaMLGMMRegressor (that I have now renamed GaussianMixtureRegressor) is not predicting a row vector, but a matrix with a single column, as it can predict multiple targets, so currently always return matrices. What do you suggest I change:

ablaom commented 1 year ago

For better or worse, the MLJ standard for returning multi-target predictions is to return a table. (This may not be actually documented but it is adopted in several places) . So there is a case distinction here. This means we generally have separate model types for regular and multitarget predictors. Which is not too bad, as it is not the main use-case.

I suppose you could have a single model that always predicts a table, but I think this will be totally unexpected for users. So, while I realize this is a pain, I think the best plan is a separate MultitargetGaussianMixtureRegressor for the MLJ interface.

a row vector, but a matrix with a single column

Same thing, in my vernacular 😉

sylvaticus commented 1 year ago

Should have been fixed in master:

using Pkg
pkg"activate --temp"
pkg"add MLJBase, MLJModels, BetaML#master"
using MLJBase, MLJModels
import BetaML: GaussianMixtureRegressor, MultitargetGaussianMixtureRegressor

model = GaussianMixtureRegressor()
X, y = make_regression();
mach = machine(model, X, y) |> fit!
yhat = predict(mach, X);
l2(yhat, y) # ok

y2=hcat(y,y)
model2 = MultitargetGaussianMixtureRegressor()
mach2 = machine(model2, X, y2) |> fit!
yhat2 = predict(mach2, X);
l2(yhat2, y2) # ok
ablaom commented 1 year ago

And if y2 is a table instead of a matrix, is that allowed? Elswhere, y2 and yhat2 would both be tables. See eg https://github.com/JuliaAI/NearestNeighborModels.jl/blob/f4362b3967dc1e4874cd9875f428c4d13316b761/src/models.jl#L312

Sorry this isn't documented anywhere.