mkabbasi / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

Deprecate CapitalTypeProliferator, ContainsHyphenProliferator, NumericTypeProliferator #248

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I just introduced CharacterCategoryPatternExtractor in r2874, which generates 
character patterns based on the unicode character categories. I think this is a 
better, more general solution than CapitalTypeProliferator, 
ContainsHyphenProliferator, NumericTypeProliferator, etc. and I think we should 
probably deprecate these.

The feature values aren't exactly equivalent though, so I'd like some input. 
For CapitalTypeProliferator and ContainsHyphenProliferator, you should get 
everything you need from the Lu, Ll and Pd in your patterns. For 
NumericTypeProliferator, you wouldn't get the regex we have for year digits 
(though you could get 4 digits, NdNdNdNd) and you wouldn't get the roman 
numeral pattern, but you should get everything else.

What I'd really like is for people who are using any of the above proliferators 
to try CharacterCategoryPatternExtractor instead and let me know if it works as 
well or not.

Original issue reported on code.google.com by steven.b...@gmail.com on 29 Apr 2011 at 10:07

GoogleCodeExporter commented 9 years ago

Original comment by steven.b...@gmail.com on 24 Jul 2012 at 5:53

GoogleCodeExporter commented 9 years ago

Original comment by phi...@ogren.info on 4 Aug 2012 at 5:36

GoogleCodeExporter commented 9 years ago
The solution we've decided on is to deprecate all of the proliferators and copy 
them to a new 'function' package.  The copied versions will implement 
com.google.common.base.Function<Feature, List<Feature>>.  They will also be 
named *FeatureFunction - so e.g. CapitalTypeFeatureProliferator will be called 
CapitalTypeFeatureFunction.  ProliferatingExtractor will be copied and called 
FeatureFunctionExtractor

Original comment by phi...@ogren.info on 4 Aug 2012 at 6:04

GoogleCodeExporter commented 9 years ago
I have committed the new feature.function package.  Here are a few 
implementation details.  
- added FeatureFunction interface to make it possible to have a varargs 
constructor in FeatureFunctionExtractor
- changed static fields to enums, feature values are the strings from the enum 
toString() methods
- the FeatureProliferator had a proliferate method that took a list of 
features.  I have duplicated this functionality in the one place this method 
was used, in FeatureFunctionExtractor, as a static method.

This issue can be closed when the related compiler warnings are taken care of.  

Original comment by phi...@ogren.info on 5 Aug 2012 at 3:26

GoogleCodeExporter commented 9 years ago
I've removed the compiler warnings

Original comment by phi...@ogren.info on 5 Aug 2012 at 3:42