StopIteration error? - Githubissues

brucewlee commented 3 years ago

Hello, thank you for making this amazing tool opensource. I keep on receiving the following error with your latest version on benepar_en3 and spacy 3.0.

The same code that I'm using works with a shorter length of text. Thus, it certainly seems that the issue is coming from max-token(or length) allowed from the pretrained model.

The weird thing is that the passage (also provided below) seems to run under certain code implementations. I ran the same corpus several times just a few days ago and constituency parsing worked just fine. The issue arose when I removed virtualenv and re-installed everything for migration.

Is there a given max-token threshold for benepar_en3? Assuming that it is based on T5, there shouldn't be a maximum input sequence like Bert does...

error:

Traceback (most recent call last):
  File "extract.py", line 70, in <module>
    print(make_feature_table("OneStop"))
  File "extract.py", line 52, in make_feature_table
    result = iterator(input_df)
  File "extract.py", line 40, in iterator
    LingFeat = lf.starter(text)
  File "/Users/<file>/extractor.py", line 66, in __init__
    self.NLP_doc = NLP(text)
  File "/Users/<file>/spacy/language.py", line 995, in __call__
    error_handler(name, proc, [doc], e)
  File "/Users/<file>/spacy/util.py", line 1498, in raise_error
    raise e
  File "/Users/<file>/spacy/language.py", line 990, in __call__
    doc = proc(doc, **component_cfg.get(name, {}))
  File "/Users/<file>/benepar/integrations/spacy_plugin.py", line 151, in __call__
    self._parser.parse(
  File "/Users/<file>/benepar/parse_chart.py", line 414, in parse
    encoded = [self.encode(example) for example in examples]
  File "/Users/<file>/benepar/parse_chart.py", line 414, in <listcomp>
    encoded = [self.encode(example) for example in examples]
  File "/Users/<file>/benepar/parse_chart.py", line 193, in encode
    encoded = self.retokenizer(example.words, example.space_after)
  File "/Users/<file>/benepar/retokenization.py", line 150, in __call__
    example = retokenize(self.tokenizer, words, space_after, **kwargs)
  File "/Users/<file>/benepar/retokenization.py", line 62, in retokenize
    token_idx, (token_start, token_end) = next(offset_mapping_iter)
StopIteration

failing passage:

Barack Obama flew back to Washington and his desk in the Oval Office on Wednesday, hours after delivering an election victory speech in Chicago in which he called for the country to unite behind him. 
You voted for action, not politics as usual, Obama said in his address, but there was little sign that his call would be answered, with the President facing the prospect of doing business with a hostile Republican-led House of Representatives for at least another two years and a looming showdown over spending and debt  the so-called fiscal cliff. 
Unlike after his election in 2008, the President is unlikely to be given a honeymoon period. 
Both the Republican House Speaker, John Boehner, and the Democratic Leader in the Senate, Harry Reid, spoke about a need to work together to resolve the crisis, but it could turn into one of the biggest clashes yet between the White House and Congress under Obamas presidency. 
While Obama easily beat off the challenge from his Republican opponent Mitt Romney, holding swing state after swing state, the election provided yet another reminder of just how divided America remains. 
While the inauguration is not until January, in effect Obama embarked on his second term on Wednesday. Having disappointed many supporters in his first term, he is looking now to establish a legacy that will transform him from a middling president into a great one. 
As well as overseeing what he hopes will be continued economic recovery, he hopes to address issues ranging from immigration reform to investment in education and climate change, and, in foreign policy, from Iran to Israel-Palestine. 
As well as comfortably winning more than the required 270 electoral college votes, he also secured a higher share of the popular vote. 
Boehner, in a statement, sounded conciliatory. He cited the need for both parties to find common ground and take steps together to help our economy grow and create jobs, which is critical to solving our debt. Obama is reported to have phoned Boehner to begin negotiation. 
Reid, so often at odds with Boehner, also sounded conciliatory, saying: I look at the challenges that we have ahead of us and I reach out to my Republican colleagues in the Senate and the House. Lets come together. We know what the issues are; lets solve them. 
Obama, in an initially off-the-record interview during the campaign, expressed optimism of a grand bargain with the Republicans, one that eluded him in 2011. The trouble will come when talks move to detail, with the Republicans wanting to protect military spending while the Democrats seek cuts. Obama has called for tax increases on households earning more than $250,000; Boehner has rejected any tax increases. 
Shares dropped on the Dow in anticipation of continued gridlock. By lunchtime, all the major US markets were down over 300 points. 
The new House, which will be formed in January, will look much like the existing one, which has a huge Republican majority. The Senate too remained little changed, with the Democrats retaining their slim majority, gaining three and losing one. 
In the presidential race, Romney won only one of the swing states, North Carolina, while Obama held New Hampshire, Virginia, Ohio, Wisconsin, Nevada, Iowa and Colorado. 
In his victory speech in Chicago, Obama referred to the long queues to vote and said there was a need for electoral reform. 
He returned to the soaring rhetoric that was his trademark during the 2008 election but which he dispensed with in 2012. Amid the disillusionment with his presidency and the tough economic conditions, his campaign team decided it was inappropriate. 
But having won, he returned to famous lines from earlier speeches, reprising once again his 2008 slogan about hope. 
Stepping up to the lectern to the upbeat sounds of Stevie Wonders Signed, Sealed, Delivered, Im Yours, Obama told the ecstatic crowd of supporters: Tonight in this election, you, the American people, reminded us that while our road has been hard, while our journey has been long, we have picked ourselves up, we have fought our way back. And we know in our hearts that for the United States of America the best is yet to come. In a speech that lasted more than 25 minutes, after paying emotional tribute to his wife, Michelle, and his daughters, Malia and Sasha  as well as to his Vice-President, Joe Biden  Obama returned to the message that first brought him to national attention. 
We are not as divided as our politics suggests, he said. Were not as cynical as the pundits believe. We are greater than the sum of our individual ambitions, and we remain more than a collection of red states and blue states. We are, and forever will be, the United States of America. 
Obama made clear he had an agenda in mind for his second term, citing changes in the tax code, immigration reform and, as he put it, an America that isnt threatened by the destructive power of a warming planet. 
Shortly beforehand, Romney had phoned the President to concede. In a gracious concession speech in Boston, Romney told his supporters: The nation, as you know, is at a critical point. At a time like this, we cant risk partisan bickering and political posturing. Our leaders have to reach across the aisle to do the peoples work. 
He continued: This is a time for great challenges for America and I pray the President will be successful in guiding our nation. 
The campaign almost throughout has been a referendum on Obama. Although there was widespread disillusionment with the slow pace of economy recovery and a high unemployment level, Americans decided to stick with him. 
Historically, it would have been a disappointment for African Americans and many white liberals if the first black presidency had ended in failure, halted prematurely.

ksmultiacc3 commented 3 years ago

facing the same issue

NonvolatileMemory commented 2 years ago

facing the same issue

willemsenbram commented 2 weeks ago

I realize I'm a bit late to the party, but better late than never. It seems that it's strings with consecutive whitespace characters that can cause trouble. To illustrate with a quick workaround, if we first run @brucewlee's sample text through " ".join(text.split()), it should get parsed without raising the error.

nikitakit / self-attentive-parser

StopIteration error? #75