haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
5.97k stars 1.13k forks source link

FR: Warn before trying to train where the label column has any nulls #753

Closed salamanders closed 10 months ago

salamanders commented 10 months ago

If you try to train a RandomForest to predict a label column, and that label column has any nulls, you get

Exception in thread "main" java.lang.UnsupportedOperationException: TheLabelColumn:Double at smile.data.vector.VectorImpl.toIntArray(VectorImpl.java:182) at smile.classification.ClassLabels.fit(ClassLabels.java:120) at smile.classification.RandomForest.fit(RandomForest.java:303) at smile.classification.RandomForest.fit(RandomForest.java:195) at smile.classification.RandomForest.fit(RandomForest.java:175) at MainKt.main(Main.kt:37) at MainKt.main(Main.kt)

Are there any algos that don't mind a few nulls in the label column? If not, could it throw an error when you first start training stating that your input data is wrong and you must first manually exclude rows with nulls?

haifengl commented 10 months ago

samples without labels should be filtered first. No algorithms support that.