openml / EvaluationEngine

Sources of the Java Evaluation Engine
8 stars 6 forks source link

regression datasets handled as classification datasets #29

Closed joaquinvanschoren closed 5 years ago

joaquinvanschoren commented 5 years ago

Please check the following: https://github.com/openml/EvaluationEngine/commit/7be41b09992e570d7159e61ba95e6bcdaee42f2c

numValues use to be 0 for regression. It is now 1?

This trips up the AttributeStatistics, which wants to compute the class distribution of a feature if numClasses > 0.

https://github.com/openml/EvaluationEngine/blob/master/src/main/java/org/openml/webapplication/models/AttributeStatistics.java#L56

The result is that regression datasets aren't parsed correctly anymore, e.g.:

[19-06-2019 23:30:14] [OK] [Process Dataset] Processing dataset 41936 - obtaining features.
java.lang.ArrayIndexOutOfBoundsException: Index 3600 out of bounds for length 1
    at org.openml.webapplication.models.AttributeStatistics.addValue(AttributeStatistics.java:72)
    at org.openml.webapplication.features.ExtractFeatures.getFeatures(ExtractFeatures.java:79)
    at org.openml.webapplication.ProcessDataset.process(ProcessDataset.java:55)
    at org.openml.webapplication.ProcessDataset.<init>(ProcessDataset.java:32)
    at org.openml.webapplication.Main.main(Main.java:115)
joaquinvanschoren commented 5 years ago

I submitted a PR