openml / EvaluationEngine

Sources of the Java Evaluation Engine
8 stars 6 forks source link

Feature/lazy loading arff #39

Closed josvandervelde closed 3 weeks ago

josvandervelde commented 1 year ago

Instead of reading the arff files as weka Instances, this PR switches to weka ArffReader. This way, the complete file is not read in memory, but it is evaluated row by row, reducing memory footprint.

I changed only the behavior of ProcessDataset. Other places where Arff files are read, are still reading the complete file into memory (albeit more explicitly, by creating the Instances inside the EvaluationEngine instead of inside openml-weka). I understood that ProcessDataset was the biggest bottleneck.

See https://github.com/openml/openml-weka/pull/27 for the corresponding openml-weka PR.

Any feedback is appreciated!