laito / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

Feature should require name as String and value as Numeric #337

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Currently, we don't really have any guidelines for what we put in Feature 
names, and what we put in Feature values. For example:

* CoveredTextExtractor creates a Feature with *no name*, and just a value
* CleartkExtractor.Count creates a Feature with the concatenation of the 
original feature *name and value* as the name, and an integer as the value
* CleartkExtractor.Ngram creates a Feature with no value, and where the name is 
the concatenation of the original feature *values*

Because of this, it can be hard to generically process features, e.g. in 
feature selection:

https://groups.google.com/d/topic/cleartk-developers/3lFXA3sfZQo/discussion

To fix this, we should really give explicit guidelines (and follow them 
ourselves) of what goes into a Feature name, and what goes into a Feature 
value. My recommendation would be:

* The feature name should always be a non-null String
* The feature value should always be a Number

That would mean that categorical features always have a String name and a null 
value, and continuous features always have a String name and a number for a 
value. Essentially, this would mean

(1) Deprecating the constructors Feature(Object) and Feature(String, Object)
(2) Adding constructors Feature(String) and Feature(String, Number)

Fixing the deprecations would be a bit complicated though, since if we change 
the names/values of ClearTK features, then everyone has to re-train all their 
models.

Original issue reported on code.google.com by steven.b...@gmail.com on 11 Oct 2012 at 8:00

GoogleCodeExporter commented 9 years ago

Original comment by steven.b...@gmail.com on 11 Oct 2012 at 8:02

GoogleCodeExporter commented 9 years ago

Original comment by lee.becker on 17 Feb 2013 at 6:06

GoogleCodeExporter commented 9 years ago

Original comment by steven.b...@gmail.com on 3 May 2013 at 8:44

GoogleCodeExporter commented 9 years ago

Original comment by phi...@ogren.info on 15 Mar 2014 at 5:41