hgpatswu / berkeleyparser

Automatically exported from code.google.com/p/berkeleyparser
1 stars 0 forks source link

multiple spaces in input without -tokenize #1

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

Have two spaces or more between words in input

example: echo "a  b" | java -jar berkeleyParser.jar -gr eng_sm5.gr
java.lang.StringIndexOutOfBoundsException: String index out of range: 0
    at java.lang.String.charAt(String.java:687)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.getSignature(Unknown Source)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.getCachedSignature(Unknown
Source)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.score(Unknown Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.initializeChart(Unknown
Source)
    at edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.doPreParses(Unknown
Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.getBestConstrainedParse(Unknow
n
Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.getBestConstrainedParse(Unknow
n
Source)
    at edu.berkeley.nlp.PCFGLA.BerkeleyParser.main(BerkeleyParser.java:190)

If there is only one space, one obtains a parse tree.
echo "a b" | java -jar berkeleyParser2.jar -gr eng_sm5.gr 
( (NP (DT a) (X (SYM b))) )

If you run the parser with tokenization (-tokenize), it works fine.

Suggestion: track the line number in the input and show it when printing
the trace. Makes debugging easier.

Original issue reported on code.google.com by benoit.f...@gmail.com on 11 Feb 2009 at 9:15