paulgoetze / weka-jruby

Machine Learning & Data Mining with JRuby
MIT License
65 stars 8 forks source link

Fix: unassigned training instance issues #26

Closed paulgoetze closed 6 years ago

paulgoetze commented 6 years ago

Fixes #10/#25.

This allows using the #classify/#cluster and #distribution_for methods on deserialized classifiers and clusterers.

When serializing classifiers and clusterers, we now additionally store the structure of the training data in a separate serialized file (named <filename>.structure).

I.e. if you serialize your trained classifier to classifier.model, then there will be an additional classifier.model.structure file, which holds the header of the training data (info about attributes, class attribute, etc.).

When the classifier/clusterer is deserialized we use this structure file to set the deserialized object’s instances_structure attribute, which is used internally to allow passing an array of values to the #classify/#cluster, and #distribution_for methods.