summanlp / textrank

TextRank implementation for Python 3.
https://pypi.org/project/summa/
MIT License
1.25k stars 261 forks source link

summa vs gensim #78

Closed luke4u closed 4 years ago

luke4u commented 4 years ago

Hi!

First, thank you for making and sharing such a nice package.

I am wondering what is the difference between summa and gensim summarization? On a high level, it is all based textrank and using BM25 for ranking. I am not sure how summa embeds words and sentences. Can you please share some insights?

Thanks in advance. Luke

fedelopez77 commented 4 years ago

Hi Luke,

Thank you! We are glad that you like it!

Both gensim summarization and summa are the same. We implemented summa first, and then we added it in gensim. The gensim version is better optimized to deal with large documents, since the sentence graph is stored in a different way, but the results should be equivalent.

The insights of the methods are described in the textrank paper, and the details about the improvements are in the paper "Variations of the Similarity Function of TextRank for Automated Summarization". Neither of these methods work with word embeddings in the GloVe or word2vec fashion.

I hope this helps. Best, Fede

luke4u commented 4 years ago

Thanks Fede. This is very helpful. Much appreciated it.