haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
6.02k stars 1.13k forks source link

smile.classification.RandomForest.fit throws unexpected java.lang.UnsupportedOperationException error #674

Closed mkffl closed 3 years ago

mkffl commented 3 years ago

I need to fit a rf model for classification. I cook up the formula and the dataFrame according to Smile's specifications, but an errors comes up when I use the fit method.

I would expect the code to compile as in the documentation snippets.

Is my input badly formatted? FYI This question includes the same error message, however, it seems the user was using the classification model for a regression problem.

Code and error stack trace:

import $ivy.`com.github.haifengl:smile-core:2.6.0`
import smile.classification._
import smile.data.DataFrame
import smile.data.formula.Formula

// starting point
val xtrain =  Array(0.02, 0.1, 0.18, 0.2, 0.27, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.7, 0.8, 0.9)
val ytrain = Array(0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1)

// Smile prerequisites
val ytrainDouble = ytrain.map(_.toDouble)
val dataset = Array(xtrain, ytrainDouble).transpose
val SmileFrame = DataFrame.of(dataset, "x", "y")
val formula = Formula.lhs("y")

// model definition
val regressionModel = smile.regression.RandomForest.fit(formula, SmileFrame) // compiles but is not what I need
val classificationModel = smile.classification.RandomForest.fit(formula, SmileFrame) // java.lang.UnsupportedOperationException

The last line returns:

java.lang.UnsupportedOperationException
  smile.data.vector.BaseVector.toIntArray(BaseVector.java:90)
  smile.data.vector.BaseVector.toIntArray(BaseVector.java:82)
  smile.classification.ClassLabels.fit(ClassLabels.java:106)
  smile.classification.RandomForest.fit(RandomForest.java:300)
  smile.classification.RandomForest.fit(RandomForest.java:197)
  smile.classification.RandomForest.fit(RandomForest.java:179)
  ammonite.$sess.cmd46$.<clinit>(cmd46.sc:1)

Ammonite Repl 2.3.8 Scala 2.13.3 Java 15.0.1

mkffl commented 3 years ago

Please also note that I can't reproduce the examples based on loading files, e.g. weka weather file. That's because the command import smile.io used in the documentation throws an error object io is not a member of package smile.

I think that running these examples would help me guess what format is expected. Would be great to know how to load the i/o and parsing functionality.

haifengl commented 3 years ago

To use smile.io, you should add smile-io module into your project. And your example is wrong because x is a one-dimensional array. Machine learning is for multi-dimensional data analysis.

mkffl commented 3 years ago

Thanks, I now have access to the smile-io package - that's great.

I will try adding another predictor, though I am not sure this will fix the issue. xtrain is chucked into the DataFrame alongside ytrain without any issue, and smile.regression.RandomForest runs with no issues, which suggests the input data is ok.

I believe the issue may be with ytrain being inserted into the DataFrame as a Double type, which is not compatible with the toIntArray method. I tried to insert different field types with a custom schema object but no luck so far.

Thanks for your help.