laito / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

Label for first subchunk when it starts later than the chunk #358

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Currently, in this example

"The (quick brown FOO) (fox BAR) jumped over the( lazy dog FOO)"

BIOChunking labels "lazy" as I-FOO, since it doesn't start at the same position 
as " lazy dog". However, I would expect it to be labeled with B-FOO, since it's 
still the first subchunk. Consider this example:

"The (quick brown FOO) (fox BAR) jumped over (the FOO)( lazy dog FOO)"

With the current implementation, we end up with B-FOO I-FOO I-FOO labels for 
the last three words and restore the chunks incorrectly:

"The (quick brown FOO) (fox BAR) jumped over (the lazy dog FOO)"

Of course, the same applies to BILOU.

Original issue reported on code.google.com by alexey.v...@gmail.com on 1 Apr 2013 at 11:39

GoogleCodeExporter commented 9 years ago
Thank you for the detailed bug report.  I have committed a test to 
org.cleartk.classifier.chunking.ChunkingTest.testIssue358() which demonstrates 
exactly the behavior you describe.  The test fails (because the code does the 
wrong thing) and so it is annotated with @Ignore.  

Original comment by phi...@ogren.info on 16 Apr 2013 at 7:10

GoogleCodeExporter commented 9 years ago
This is an incompatible change. ChunkingTest.testBIOChunkingCreateOutcomes 
actually tests that we get I-FOO labels in exactly the kind of example you give.

That said, I think the described behavior makes much more sense, so I think we 
should make this change anyway. I've marked this issue as 
backwards-incompatible.

Original comment by steven.b...@gmail.com on 2 May 2013 at 6:49

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 260710f320d3.

Original comment by steven.b...@gmail.com on 2 May 2013 at 6:56