Closed Zhujunnan closed 6 years ago
Hi, In this paper it is stated that texRank uses iterations till it converges.
When seeing the code I had the same question.
Regards
Hi, guys this is tough question for me. I hough I'll find time to look at the code more deeply to answer but I guess I am just too naive. Honestly I don't remember the source paper for TextRank implementation and what is the worst there is no URL to some paper in docstring in class, just URL to some other repo. I found the commit https://github.com/miso-belica/sumy/commit/80f7dfa7ce3c7e9ce9319fa7c45c06af7bb3c4fa and seems suspicious to me :/ When I was implementing sumy I read many papers and it's possible I mixed them somehow. What means I am lying all the time about this method :( But because Python is quite high level it may be that this iterative process is hidden somewhere in the high-level function. I really need to check the code and then I can give you the real answer. These are my guesses and possibilities to just write something. But you know my time with sumy is very limited for a long time so...
Hi, Even if it is not the method in that paper, it did a good job compared to other methods when tested on MultiLing2015 training corpus. Here are the results:
Peer | Rouge1-R | Rouge1-P | Rouge-F | Rouge2-R | Rouge2-P | Rouge2-F |
---|---|---|---|---|---|---|
KLSummarizer | 0.32745 | 0.34508 | 0.33585 | 0.06317 | 0.06666 | 0.06483 |
LexRankSummarizer | 0.37926 | 0.39425 | 0.38621 | 0.09350 | 0.09712 | 0.09518 |
LsaSummarizer | 0.34674 | 0.37187 | 0.35832 | 0.07678 | 0.08220 | 0.07929 |
LuhnSummarizer | 0.35671 | 0.39231 | 0.37300 | 0.08575 | 0.09404 | 0.08954 |
RandomSummarizer | 0.35968 | 0.37472 | 0.36676 | 0.07961 | 0.08262 | 0.08102 |
SumBasicSummarizer | 0.36909 | 0.37383 | 0.37110 | 0.07683 | 0.07798 | 0.07732 |
TextRankSummarizer | 0.37688 | 0.40179 | 0.38864 | 0.09852 | 0.10471 | 0.10145 |
So, thank you for this great job
Hi, I'm looking at the TextRank code and yes it seems that it really doesn't run PageRank as described in (Mihalcea and Tarau, 2004), the original paper. I don't think the iterative process is hidden somewhere either. It may perform well, but claiming to be TextRank doesn't feel right to me.
I might be able to provide a fix. Would you be interested in a PR for this?
I am happy to inform you that @kmkurn provided new implementation for TextRank based on original paper in https://github.com/miso-belica/sumy/pull/100. So I am closing this issue. Feel free to create new one if needed or send PR with any proposal :) Thank you all
Hi, I have a question about the textrank module. As I know, the textrank is based on the pagerank algorithm. However, in the text_rank.py file, I just see the code which builds edges between sentences and don't seem to use iterative solution to calculate it. I don't know if I understand correctly, I am looking forward to your answer. Thx!