laito / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

default behavior of Bag context for out-of-bounds annotations #395

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I'm a little surprised by the default behavior of the Bag context when the 
specified range of its context annotations goes "out-of-bounds" - i.e. past the 
last annotations.  If the specified range goes past the last token in the JCas, 
then "out-of-bounds" features will be generated.  Such features have names 
whose prefix is {{{OOB}}} followed by a digit corresponding to how far out of 
the range the feature is.  This is pretty confusing default behavior I think.  
You can imagine that you might generate 50 features per bag.  When you get to 
the end of your token annotations then you will end up with features with the 
values OOB1, OOB2, ... OOB49.  Yikes!  To me, it seems that the default 
behavior would be to filter out OOB features for the Bag context.  When those 
features are desired, then it seems like they should not be indexed.  

[Steve]
The Bag context has no concept of in or out of bounds. All it does is
strip off the position information generated other contexts. So if
you're seeing out-of-bounds stuff, it's from the other contexts, not
from Bag.

That said, Bag strips the position by taking the .feature field of a
ContextFeature, and that .feature field is a little bit strange for
out-of-bounds features. If you want to mess around with this, look at
the ContextFeature(String, int, int, String) constructor.

I agree that the interaction of ContextFeature, Bag and other contexts
probably isn't what you would have expected. Where the fix belongs,
I'm not 100% sure. 

Original issue reported on code.google.com by phi...@ogren.info on 8 Dec 2013 at 3:15

GoogleCodeExporter commented 9 years ago
I was wondering if you get the same features when you use the extractWithin 
method.  You do.  

Original comment by phi...@ogren.info on 8 Dec 2013 at 3:18

GoogleCodeExporter commented 9 years ago

Original comment by phi...@ogren.info on 15 Mar 2014 at 5:41