miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.
https://miso-belica.github.io/sumy/
Apache License 2.0
3.46k stars 525 forks source link

power_method produces NaN, inf values #187

Closed FlxB2 closed 11 months ago

FlxB2 commented 1 year ago

Hi, I noticed that power_method for LexRankSummarizer and TextRankSummarizer may produce NaN, inf values for some input values. I am not entirely sure if that matters for your use case, but to me it seems weird, because other implementations don't seem to have this problem.

For example the testcase here uses the matrix:

matrix = numpy.array([
    [0.1, 0.2, 0.3, 0.6, 0.9],
    [0.45, 0, 0.3, 0.6, 0],
    [0.5, 0.6, 0.3, 1, 0.9],
    [0.7, 0, 0, 0.6, 0],
    [0.5, 0.123, 0, 0.111, 0.9],
])

and LexRankSummarizer.power_method(matrix, LexRankSummarizer.epsilon) returns the value [inf inf nan inf inf]

Another example is the following matrix:

matrix = numpy.array([
    [0.1,0.2,0.3],
    [0.2,1,0.5],
    [0.3,0.5,0.6]
])

It returns: [inf inf inf].

The implementation I found here gives [0.2, 0.5666666666666667, 0.4666666666666667]

I used Python 3.9.16 and numpy 1.24.1

miso-belica commented 11 months ago

Fixed in https://github.com/miso-belica/sumy/pull/194 Thank @AryazE 🙂