haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
6.05k stars 1.13k forks source link

FR: Warn before trying to train where the label column has any nulls #753

Closed salamanders closed 1 year ago

salamanders commented 1 year ago

If you try to train a RandomForest to predict a label column, and that label column has any nulls, you get

Exception in thread "main" java.lang.UnsupportedOperationException: TheLabelColumn:Double at smile.data.vector.VectorImpl.toIntArray(VectorImpl.java:182) at smile.classification.ClassLabels.fit(ClassLabels.java:120) at smile.classification.RandomForest.fit(RandomForest.java:303) at smile.classification.RandomForest.fit(RandomForest.java:195) at smile.classification.RandomForest.fit(RandomForest.java:175) at MainKt.main(Main.kt:37) at MainKt.main(Main.kt)

Are there any algos that don't mind a few nulls in the label column? If not, could it throw an error when you first start training stating that your input data is wrong and you must first manually exclude rows with nulls?

haifengl commented 1 year ago

samples without labels should be filtered first. No algorithms support that.