type of summarization - Githubissues

keikmin commented 4 years ago

This is more likely a question rather than an issue.

I tried Youyakuman in Japanese, but the result looks like an extractive summarization. So, I have questions.

Is it possible to try abstractive summarization with Youyakuman?
Is pretrained data based on extractive summarization? (If so, I guess it doesn't work properly, and I need to prepare data for pretraining.)

Thank you.

neilctwu commented 4 years ago

Hi @keikmin Thanks for asking. You are right. Youyakuman is based on BertSum, which is an extractive summarization model. So yes, its extractive one.

Is it possible to try abstractive summarization with Youyakuman?

Unfortunately not. If you look at the code inside, you'll finde Youyakuman(BertSum) predict result by sentence unit inetead of word unit. This made it inmpossible to recreate a new sentence from word. There's abstractive summarization version of BertSum called BertSumAbs, published by the same author (Github: https://github.com/nlpyang/PreSumm). However, IMHO, it doesn't appear to summarize meaningful sentence even trained in English when i tried myself.

Is pretrained data based on extractive summarization?

No, data isn't based on extractive summarization. One of the Japanese data source I used is from LivedoorNews. They have professionals journalist summarize news into 2 or 3 sentences, and its definetely not extractive. Youyakuman use rouge score to match the ground truth(summarization wrote by journalist) to in-article sentence, in order to train extractive model.

Hope these helps.

keikmin commented 4 years ago

Thank you for your detailed explanation. It was very helpful and I can understand the current status of summarization tasks in NLP.

neilctwu / YouyakuMan

type of summarization #8