miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.
https://miso-belica.github.io/sumy/
Apache License 2.0
3.46k stars 525 forks source link

Division by zero by rouge.py, only in some algos #205

Open Manamama opened 5 months ago

Manamama commented 5 months ago

Only in sumy_eval, and mostly edmundson, lsa, luhn (not e.g. text-rank - update, also in this, it is random) throw the below:

sumy_eval edmundson  ~/Downloads/crown_land_eng_60.txt  --length=60%  --language=eng --url=https://en.wikipedia.org/wiki/Crown_land
Precision: 0.500000
Recall: 0.514563
F-score: 0.507177
Cosine similarity: 0.952655
Cosine similarity (document): 0.950033
Unit overlap: 0.485461
Unit overlap (document): 0.502559
Rouge-1: 0.551075
Rouge-2: 0.448803
Rouge-L (Sentence Level): 0.498733
Traceback (most recent call last):
  File ".local/bin/sumy_eval", line 8, in <module>
    sys.exit(main())
  File ".local/lib/python3.10/site-packages/sumy/evaluation/__main__.py", line 171, in main
    result = evaluate(evaluated_sentences, reference_sentences)
  File ".local/lib/python3.10/site-packages/sumy/evaluation/rouge.py", line 290, in rouge_l_summary_level
    union_lcs_sum_across_all_references += _union_lcs(evaluated_sentences, ref_s)
  File ".local/lib/python3.10/site-packages/sumy/evaluation/rouge.py", line 250, in _union_lcs
    union_lcs_value = union_lcs_count / combined_lcs_length
ZeroDivisionError: division by zero

My box


Operating System: Ubuntu 22.04.3 LTS x86_64 Kernel: 6.2.0-39-generic Shell: /bin/bash 5.1.16 Python: 3.10.12

Manamama commented 5 months ago

Update. And this

sumy_eval sum-basic ...

or even sumy sum-basic --length=60% --language=eng ... results in

  File ".local/lib/python3.10/site-packages/sumy/summarizers/sum_basic.py", line 75, in <listcomp>
    word_freq_sum = sum([word_freq_in_doc[w] for w in content_words_in_sentence])
KeyError: 'own'

(while kl, lsa etc. work fine)