sylvaticus / BetaML.jl

Beta Machine Learning Toolkit
MIT License
94 stars 14 forks source link

Scaling wish cache a matrix with non-numerical values result in an error #73

Closed sylvaticus closed 3 months ago

sylvaticus commented 3 months ago

The problem is that both StandardScaler and MinMaxScaler _fit function start defining the outer container as

X_scaled = cache ? float.(X) : nothing

But if X has non-numerical values this result in an error:

using BetaML
x       = [[4000,1000,2000,3000] ["a", "categorical", "variable", "not to scale"] [4,1,2,3] [0.4, 0.1, 0.2, 0.3]]
mod     = Scaler(MinMaxScaler(outputRange=(0,10)),skip=[2])
xscaled = fit!(mod,x)

What there should be done instead is that if the matrix is a Union{Int64,OtherT} and we are scaling an Int column, this column is converted to float to host the scaling result.