laito / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

Improvements in CharacterCategoryPatternExtractor #357

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
For my task I got a slight but noticeable improvement in quality by adding a 
PatternType in CharacterCategoryPatternExtractor which only merges more than 
two repeats. 

In addition, it may be useful to make different distinctions than just Unicode 
category of characters, e.g. between letters of different scripts.

I've attached a patch here.

Original issue reported on code.google.com by alexey.v...@gmail.com on 1 Apr 2013 at 6:41

Attachments:

GoogleCodeExporter commented 9 years ago
Alex,  Thank you for the patch!  It looks good.  For minor patches like this 
one we do not require a contributor's agreement, but you might consider 
becoming a contributor as explained 
[DeveloperFAQ#I_would_like_to_contribute_to_ClearTK.__What_now? here].  Any 
patch that is more substantial than this one will require a contributor's 
agreement.  

Original comment by phi...@ogren.info on 16 Apr 2013 at 3:42

GoogleCodeExporter commented 9 years ago
I guess my attempt at embedding wiki syntax here failed.  Here is the URL:

https://code.google.com/p/cleartk/wiki/DeveloperFAQ#I_would_like_to_contribute_t
o_ClearTK.__What_now?

Original comment by phi...@ogren.info on 16 Apr 2013 at 3:44

GoogleCodeExporter commented 9 years ago

Original comment by phi...@ogren.info on 16 Apr 2013 at 3:47

GoogleCodeExporter commented 9 years ago
This issue was closed by revision c73c5e9d5968.

Original comment by phi...@ogren.info on 16 Apr 2013 at 5:31

GoogleCodeExporter commented 9 years ago
Alex,

Steve and I debated about several different possibilities for the best way to 
accommodate your idea.  We ended up taking your proposal and patch pretty close 
to what you came up with.  We renamed the pattern type to 
REPEATS_AS_KLEENE_PLUS and changed the actual patterns to use a '+' rather than 
a repeated symbol.  The unit tests reflect this change, so take a look there.  

Thanks again for your help!

Original comment by phi...@ogren.info on 16 Apr 2013 at 6:02

GoogleCodeExporter commented 9 years ago
For some reason I am unable to join the cleartk-developers group and so can't 
ask for the email address to send the contribution agreement to.

Original comment by alexey.v...@gmail.com on 16 Apr 2013 at 7:21

GoogleCodeExporter commented 9 years ago
Looks like there was a problem in our group configuration. Could you try 
joining again?

Original comment by steven.b...@gmail.com on 17 Apr 2013 at 12:19