haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
5.99k stars 1.12k forks source link

"Invalid posteriori vector size" exception thrown when running K-fold cross validation #681

Closed adippold closed 3 years ago

adippold commented 3 years ago

When attempting to do a K-fold cross validation of the Random Forest model, I get the following:

java.lang.IllegalArgumentException: Invalid posteriori vector size: 4, expected: 5
    at smile.classification.RandomForest.predict(RandomForest.java:602)
    at smile.classification.RandomForest.predict(RandomForest.java:78)
    at smile.validation.ClassificationValidation.of(ClassificationValidation.java:177)
    at smile.validation.ClassificationValidation.of(ClassificationValidation.java:206)
    at smile.validation.CrossValidation.classification(CrossValidation.java:293)
    at com.gmg.p7.app.text_alignment.TextAlignmentTrainer.train(TextAlignmentTrainer.java:374)
    at com.gmg.p7.app.text_alignment.TextAlignmentTrainer.main(TextAlignmentTrainer.java:464)

Problem seems to be in ClassificationValidation.java line 166:

int k = MathEx.unique(y).length;

The code extracts the number of labels by looking at the data instead of the measure definition.

Replacing the above with:

int k = ClassLabels.fit( formula.y(train) ).k;

solves the problem.

I see the same statement in line 106 as well.

Please fix - thank you!

haifengl commented 3 years ago

You training data don't have all labels?

haifengl commented 3 years ago

Fixed.

adippold commented 3 years ago

Yes, a certain subset of the data was missing one of the labels. Thank you for fixing the issue!