NLTK error when parsing sentences with unescaped parentheses.

yzhangcs / parser

:rocket: State-of-the-art parsers for natural language.

MIT License

832 stars 141 forks source link

sentence = 'This is a t(est)'
parser = Parser.load('crf-con-roberta-en')
parser.predict([sentence], lang='en', verbose=False)

crashes with a following error:

File /opt/homebrew/lib/python3.10/site-packages/nltk/tree/tree.py:731, in Tree._parse_error(cls, s, match, expecting)
    730 msg += '\n{}"{}"\n{}^'.format(" " * 16, s, " " * (17 + offset))
--> 731 raise ValueError(msg)

ValueError: Tree.read(): expected ')' but got 'end-of-string'
            at index 71.
                "..._ -RRB-)))"
                              ^

The same is true for sentence = '(713)853-7041'.

If I add a whitespace before ( and ), everything works fine.

>>> from supar.utils.tokenizer import Tokenizer >>> t = Tokenizer() >>> t('This is a t(est)') ['This', 'is', 'a', 't(', 'est', ')'] >>> t('This is a t (est)') ['This', 'is', 'a', 't', '(', 'est', ')']

yzhangcs / parser

NLTK error when parsing sentences with unescaped parentheses. #115