miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.
https://miso-belica.github.io/sumy/
Apache License 2.0
3.46k stars 525 forks source link

Summarising books by verbs #179

Closed mrx23dot closed 1 year ago

mrx23dot commented 2 years ago

Not really an issue, just a question.

So I'm summarising books by paragraphs, and I was thinking to get the main plot 'happening' it would be beneficial to prioratise sentences with verbs.

I can collect every verb for a given language. As I see only Edmundson is capable taking hint words. Not sure what stop-words are, documentation doesn't explain them.

Has anyone tried similar thing?

mrx23dot commented 2 years ago

My other idea was to take the book 'plot' from wikipedia, mark everything in it important, and use that as hint during extraction.

miso-belica commented 2 years ago

Hi, if you can collect verbs from the text it should be as you said. Edmundson is the method for you. Use those words as hints for the method.

Regarding the stop-words, it's not that hard to Google them 😏

I am curious what you come with and about the results. Don't forget to let us all know 🙂

miso-belica commented 1 year ago

@mrx23dot did my suggestion help a bit?

mrx23dot commented 1 year ago

Sorry for the delay, actually I've been using LSA every since, works great on paragraphs, the key is to set the correct number of sentences, e.g not hard code to 3 but set it to dynamic count/4.

It's barely noticeable in books, but makes reading so much faster. Thanks!