Closed norpadon closed 1 year ago
@norpadon Hi, the problem is caused because brackets are not totally handled after tokenization.
>>> from supar.utils.tokenizer import Tokenizer
>>> t = Tokenizer()
>>> t('This is a t(est)')
['This', 'is', 'a', 't(', 'est', ')']
>>> t('This is a t (est)')
['This', 'is', 'a', 't', '(', 'est', ')']
This bug has been fixed by latest commits, referring to https://github.com/yzhangcs/parser/blob/ce34fc254e5a0757605c5be7db6a2cd089adc2f7/supar/utils/transform.py#L420-L458
crashes with a following error:
The same is true for
sentence = '(713)853-7041'
.If I add a whitespace before
(
and)
, everything works fine.