noob-data-analaysis / data-analysis

8 stars 4 forks source link

关于数据的科学类型, 模型不兼容的问题 #5

Open jianlin666 opened 4 years ago

jianlin666 commented 4 years ago

Hi, 你们做的项目真的太棒了

在应用MLJ进行预测时, 碰到了模型不兼容科学类型的问题

数据来自kaggle

` using MLJ,Queryverse,PrettyPrinting,Random,StatsKit,LossFunctions,Plots train_data = Queryverse.load("D://juliacode//archive//train.csv") |> DataFrame id = train_data.id #用户id变量单独存放 train_data = select(train_data, Not(:id)) #剔除用户id变量

StatsKit.countmap(train_data.Response) #查看正负样本是否均衡

y, X = unpack(train_data, ==(:Response), colname -> true) #拆包自变量与目标变量

转换科学类型

X = coerce(X, autotype(X)) y = coerce(y, autotype(y)) X |> MLJ.schema `

│ Gender │ CategoricalValue{String,UInt32} │ Multiclass{2} │ │ Age │ CategoricalValue{Int64,UInt32} │ OrderedFactor{66} │ │ Driving_License │ CategoricalValue{Int64,UInt32} │ OrderedFactor{2} │ │ Region_Code │ CategoricalValue{Float64,UInt32} │ OrderedFactor{53} │ │ Previously_Insured │ CategoricalValue{Int64,UInt32} │ OrderedFactor{2} │ │ Vehicle_Age │ CategoricalValue{String,UInt32} │ Multiclass{3} │ │ Vehicle_Damage │ CategoricalValue{String,UInt32} │ Multiclass{2} │ │ Annual_Premium │ Float64 │ ScientificTypes.Continuous │ │ Policy_Sales_Channel │ Float64 │ ScientificTypes.Continuous │ │ Vintage │ Int64 │ Count

@load XGBoostClassifier xgb = XGBoostClassifier() mach = machine(xgb, X, y) fit!(mach) #XGB报错不兼容科学类型

以下是报错内容:

MethodError: no method matching XGBoost.DMatrix(::Array{Any,2}; label=Bool[1, 0, 1, 0, 0, 0, 0, 1, 0, 0 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) Closest candidates are: XGBoost.DMatrix(!Matched::Ptr{Nothing}) at C:\Users\y84157557.julia\packages\XGBoost\fI0vs\src\xgboost_lib.jl:21 got unsupported keyword argument "label" XGBoost.DMatrix(!Matched::String; silent) at C:\Users\y84157557.julia\packages\XGBoost\fI0vs\src\xgboost_lib.jl:27 got unsupported keyword argument "label" XGBoost.DMatrix(!Matched::SparseArrays.SparseMatrixCSC{K,V}) where {K<:Real, V<:Integer} at C:\Users\y84157557.julia\packages\XGBoost\fI0vs\src\xgboost_lib.jl:34 got unsupported keyword argument "label" ... in include_string at base\loading.jl:1088 in top-level scope at archive.jl:36 in fit! at MLJBase\cJmIS\src\machines.jl:479 in #fit!#87 at MLJBase\cJmIS\src\machines.jl:481 in fit_only! at MLJBase\cJmIS\src\machines.jl:389 in #fit_only!#84 at MLJBase\cJmIS\src\machines.jl:436

请问已经按照MLJ的规则进行科学类型的变换, 为何无法兼容XGB模型呢, 谢谢!

jianlin666 commented 4 years ago

补充:

Warning: The scitype of X, in machine(model, X, ...) is incompatible with model=XGBoostClassifier @658: │ scitype(X) = Table{Union{AbstractArray{ScientificTypes.Continuous,1}, AbstractArray{Count,1}, AbstractArray{Multiclass{2},1}, AbstractArray{Multiclass{3},1}, AbstractArray{OrderedFactor{53},1}, AbstractArray{OrderedFactor{66},1}, AbstractArray{OrderedFactor{2},1}}} │ input_scitype(model) = Table{var"#s13"} where var"#s13"<:(AbstractArray{var"#s45",1} where var"#s45"<:ScientificTypes.Continuous). └ @ MLJBase C:\Users\y84157557.julia\packages\MLJBase\cJmIS\src\machines.jl:76

nesteiner commented 3 years ago

@jianlin666 卧槽,我现在才看到你的issue,你可以查查看有哪些模型支持你的数据

matching(X, y)

我都好久没动这玩样了