Open brucewlee opened 3 years ago
facing the same issue
facing the same issue
I realize I'm a bit late to the party, but better late than never.
It seems that it's strings with consecutive whitespace characters that can cause trouble.
To illustrate with a quick workaround, if we first run @brucewlee's sample text
through " ".join(text.split())
, it should get parsed without raising the error.
Hello, thank you for making this amazing tool opensource. I keep on receiving the following error with your latest version on benepar_en3 and spacy 3.0.
The same code that I'm using works with a shorter length of text. Thus, it certainly seems that the issue is coming from max-token(or length) allowed from the pretrained model.
The weird thing is that the passage (also provided below) seems to run under certain code implementations. I ran the same corpus several times just a few days ago and constituency parsing worked just fine. The issue arose when I removed virtualenv and re-installed everything for migration.
Is there a given max-token threshold for benepar_en3? Assuming that it is based on T5, there shouldn't be a maximum input sequence like Bert does...
error:
failing passage: