File "/add_score.py", line 53, in add_score
res = function(["? I haven't had a birthday since 2007. I have a b-day in October and it's almost completely ignored."], ["",])
File "/add_score_summac.py", line 28, in <lambda>
"my_summacZS_batched": lambda summs, docs: modelZS.score(docs, summs)['scores'],
File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 351, in score
score = self.score_one(source, gen)
File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 322, in score_one
image = self.imager.build_image(original, generated)
File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 113, in build_image
generated_chunks = self.split_text(generated, granularity=gran_sum)
File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 94, in split_text
return self.split_sentences(text)
File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 71, in split_sentences
sentences = nltk.tokenize.sent_tokenize(text)
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/__init__.py", line 107, in sent_tokenize
return tokenizer.tokenize(text)
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1276, in tokenize
return list(self.sentences_from_text(text, realign_boundaries))
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1332, in sentences_from_text
return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1332, in <listcomp>
return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1322, in span_tokenize
for sentence in slices:
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1421, in _realign_boundaries
for sentence1, sentence2 in _pair_iter(slices):
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 318, in _pair_iter
prev = next(iterator)
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1395, in _slices_from_text
for match, context in self._match_potential_end_contexts(text):
File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1382, in _match_potential_end_contexts
before_words[match] = split[-1]
IndexError: list index out of range
I think it is caused by the leading "? ", which might lead in an empty sentence within the metric.
Is this to be expected and explained somewhere or is this a bug?
kind regards
Edit:
I circumvented (not fixed) this is issue for now using this code:
match = re.match(r"(\s*[.?!]+\s)", summaries[i])
if match:
summaries[i] = summaries[i][len(match.group(1)):]
because empty leading sentences with other symbols than "?" also caused this issue.
Hi,
I encountered an error:
I think it is caused by the leading "? ", which might lead in an empty sentence within the metric. Is this to be expected and explained somewhere or is this a bug?
kind regards
Edit: I circumvented (not fixed) this is issue for now using this code:
because empty leading sentences with other symbols than "?" also caused this issue.