What steps will reproduce the problem?
Train a Mallet classifier with String outcomes that include white spaces (" ").
Then use the trained model to classify instances.
What is the expected output?
A classification result based on the set of outcomes used for training.
What do you see instead?
A classification result based on the last word of the outcomes used for
training.
What version of the product are you using? On what operating system?
ClearTk version 1.4.1.
OS: Mac OS version 10.8.5 and Linux Ubuntu
Comments:
My outcome labels are strings that contain spaces (" "). The ClearTk code that
writes the training instances into the training-data.mallet file includes the
outcome labels as the last field in the line. When parsing this file to
serialize the instances into the Mallet format it assumes the outcome label is
the substring after the last space in the line. In conclusion it is using only
the last word of my outcome label as a label for the trainer.
See org.cleartk.classifier.mallet.InstanceListCreator.DataIterator.next.().
Original issue reported on code.google.com by MarceloT...@gmail.com on 2 May 2014 at 6:08
Original issue reported on code.google.com by
MarceloT...@gmail.com
on 2 May 2014 at 6:08