miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.
https://miso-belica.github.io/sumy/
Apache License 2.0
3.5k stars 529 forks source link

ZeroDivisionError Rouge-L summary level #128

Open dorianve opened 5 years ago

dorianve commented 5 years ago

Hello,

I have had some ZeroDivisionErrors trying to get the Rouge-L summary level score for one of my data.

The problem was in the function _union_lcs of rouge.py where the "union longest common subsequence count" was divided by the "combined LCS length".

I added the case when combined_lcs_length was equal to 0 to return 0, and it's working fine now. (I mean, I added that case locally, I cannot change it in this repository)

Does it sound right ?

def _union_lcs(evaluated_sentences, reference_sentence):
    if len(evaluated_sentences) <= 0:
        raise (ValueError("Collections must contain at least 1 sentence."))

    lcs_union = set()
    reference_words = _split_into_words(reference_sentence)
    combined_lcs_length = 0
    for eval_s in evaluated_sentences:
        evaluated_words = _split_into_words(eval_s)
        lcs = set(_recon_lcs(reference_words, evaluated_words))
        combined_lcs_length += len(lcs)
        lcs_union = lcs_union.union(lcs)

    union_lcs_count = len(lcs_union)

    # Here the modification:
    if combined_lcs_length == 0:
        return 0
    union_lcs_value = union_lcs_count / combined_lcs_length
    return union_lcs_value
dorianve commented 5 years ago

The same goes for def _f_lcs(...) where denom could be 0. So I also added :

# ...
if denom == 0:
    return 0
# ...
miso-belica commented 5 years ago

Hi @dorianve, can you attach some simple test to reproduce this? Or maybe create a PR with the test and a fix? You can't update the repository, but you are welcome to send me a PR :)