laito / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

<OFFSET OUT OF BOUNDS> for SemanticArgument annotation in PropbankGoldAnnotator #385

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I am running the following pipeline on propbank 1.0 data and 
cleartk-corpus/src/test/resources/data/propbank-1.0/prop.txt:
org.cleartk.corpus.propbank.PropbankGoldReader;
org.cleartk.corpus.penntreebank.TreebankGoldAnnotator;
org.cleartk.corpus.propbank.PropbankGoldAnnotator;
(Using wsj/00/ section as input for PropbankGoldReader suffices to reproduce)

A cas-dump shows that for several SemanticArgument annotations in prop.txt, 
<OFFSET OUT OF BOUNDS> happens (e.g., begin: 2147483647, end: 0). This is a 
problem for further processing.

A line from prop.txt that is concerned: 
wsj/00/wsj_0015.mrg 1 12 gold order.01 p---p 11:1-ARGM-TMP 12:0-rel 
10:0*13:0-ARG1 14:1-ARG0-by

The corresponding out-of-bounds SemanticArgument has propTxt: "10:0*13:0-ARG1" 
and is in line 762695 of the attached casdump.

Cleartk Version: cleartk-corpus from cleartk 1.4.1
Propbank Version: 1.0
OS: Ubuntu

Original issue reported on code.google.com by Hartmann...@gmail.com on 25 Oct 2013 at 12:03

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by steven.b...@gmail.com on 26 Oct 2013 at 12:17

GoogleCodeExporter commented 9 years ago
So looking at the tree corresponding tot his prop.txt line:

( (S 
    (S 
      (NP-SBJ (DT The) (NN refund) )
      (VP (VBD was) 
        (ADJP-PRD 
          (ADJP 
            (NP-ADV 
              (QP (IN about) ($ $) (CD 55) (CD million) )
              (-NONE- *U*) )
            (JJR more) )
          (SBAR (IN than) 
            (S 
              (NP-SBJ-1 (-NONE- *) )
              (ADVP-TMP (RB previously) )
              (VP (VBN ordered) 
                (NP (-NONE- *-1) )
                (PP (IN by) 
                  (NP-LGS (DT the) (NNP Illinois) (NNP Commerce) (NNP Commission) ))))))))

I can see that the ARG1 refers first to the -NONE- just before "ordered", and 
then to the -NONE- just after "ordered", and to nothing else. This seems like a 
bug in the annotation to me, but I also agree that ClearTK shouldn't be 
producing offsets like [0, 2147483647] in such cases.

Original comment by steven.b...@gmail.com on 30 Oct 2013 at 11:24

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 6c0430c02161.

Original comment by steven.b...@gmail.com on 30 Oct 2013 at 11:43