isi-nlp / bolinas

SHERG rule extraction and parsing tools
Other
24 stars 14 forks source link

Problem Reading Little Prince Corpus #2

Closed aelfric closed 9 years ago

aelfric commented 9 years ago

I noticed that in the most recent commit, the order of the definition of the lexical type regular expressions was changed to define QUANTITY before IDENTIFIER. Was this a mistake?

Using the latest version of the program, parsing the public AMR corpus (http://amr.isi.edu/download/amr-bank-struct-v1.3.txt) fails at sentence lpp_1943.574. By reversing the order of these two definitions, I was able to parse the whole file.

daniel-bauer commented 9 years ago

This was indeed a mistake, made in an attempt to fix how quantities are interpreted by the AMR reader. I changed the order of lexer rules back to the previous revision for now. We will have to look at making sure quantities are represented correctly later.