laito / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

MalletCRFStringOutcomeDataWriter ignore non-string value silently #409

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
This is affecting all ClearTk version.

MalletCRFStringOutcomeDataWriter does not write the numerical or boolean values 
of Features.  I am referring to this piece of code in ClearTk's 
MalletCRFStringOutcomeDataWriter.

 @Override
68  public void writeEncoded(List<NameNumber> features, String outcome) {
69    for (NameNumber nameNumber : features) {
70      this.trainingDataWriter.print(nameNumber.name);
71      this.trainingDataWriter.print(" ");
72    }
73
74    this.trainingDataWriter.print(outcome);
75    this.trainingDataWriter.println();
76  }

Note that this is not totally obvious from this piece of code but for Features 
of String type, the nameNumber.name field contains the encoded value with the 
name whereas for any other type (e.g. Boolean, Number, etc) the field 
nameNumber.name contains only the Feature name and not the value.

I don't see a good reason for not encoding integer and boolean values.  At a 
minimum, there should be an exception thrown when such value type is handled.

Original issue reported on code.google.com by dumais....@gmail.com on 10 Sep 2014 at 9:20

GoogleCodeExporter commented 9 years ago
If nothing else, MalletCRFStringOutcomeDataWriter should throw an exception to 
inform the user that non-String values aren't supported. An alternative would 
be to convert numbers into Strings and pass them on to Mallet, but I'm not 
confident that would do the sensible thing for, say, doubles.

Original comment by steven.b...@gmail.com on 5 Nov 2014 at 1:08