miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.
https://miso-belica.github.io/sumy/
Apache License 2.0
3.51k stars 529 forks source link

Luhn's summarizer 'significant percentage' comment #184

Closed futurewarning closed 2 years ago

futurewarning commented 2 years ago

Significant % is defined as 1 [can assume the TODO might reference the boundary selection from the original paper] https://github.com/miso-belica/sumy/blob/af5a236b45d1462e38cdff2f2abd471e9c8bbdaa/sumy/summarizers/luhn.py#L12-L13 Later, best_word_count will always be len(words) https://github.com/miso-belica/sumy/blob/af5a236b45d1462e38cdff2f2abd471e9c8bbdaa/sumy/summarizers/luhn.py#L35-L36 The int() hints at the missing division. Unsure if it's a bug or something else.

miso-belica commented 2 years ago

@futurewarning Is this a question? Or just to inform me? Why the issue is closed now? Is it not relevant anymore? Should I react somehow or not?

futurewarning commented 2 years ago

@miso-belica Was a question, not relevant anymore — found the test case with overwritten significant_percentage as a fraction https://github.com/miso-belica/sumy/blob/af5a236b45d1462e38cdff2f2abd471e9c8bbdaa/tests/test_summarizers/test_luhn_sentence_rating.py#L22 Was confused as to the default 100% of cleaned words being included, but since no boundary is being estimated it makes sense to include all I guess.