Summary length - Githubissues

ghost commented 4 years ago

Hello. I'd like to know if it's possible to adjust length of output summary?

maxbaluev commented 4 years ago

@yg211

yg211 commented 4 years ago

Hi there,

Of course: in the example code for generating summaries, you can find the line

_summary = rl_summarizer.summarize(source_docs, summ_maxlen=100)

The argument _summ_maxlen controls the max number of tokens in your generated summary.

ghost commented 4 years ago

Yes, I saw that. But it adjusts max number of tokens, not average or exact.

I trained model on entire "Alice in Wonderland", setting _summ_maxlen as 9999.

Here's the output:

Alice folded her hands, and began : — “ You are old, Father William, ” the young man said, “ And your hair has become very white; And yet you incessantly stand on your head— Do you think, at your age, it is right ? ” “ In my youth, ” Father William replied to his son, “ I feared it might injure the brain; But, now that I ’ m perfectly sure I have none, Why, I do it again and again. ” “ You are old, ” said the youth, “ as I mentioned before, And have grown most uncommonly fat; Yet you turned a back-somersault in at the door— Pray, what is the reason of that ? ” “ In my youth, ” said the sage, as he shook his grey locks, “ I kept all my limbs very supple By the use of this ointment—one shilling the box— Allow me to sell you a couple ? ” “ You are old, ” said the youth, “ and your jaws are too weak For anything tougher than suet; Yet you finished the goose, with the bones and the beak— Pray, how did you manage to do it ? ” “ In my youth, ” said his father, “ I took to the law, And argued each case with my wife; And the muscular strength, which it gave to my jaw, Has lasted the rest of my life. ” “ You are old, ” said the youth, “ one would hardly suppose That your eye was as steady as ever; Yet you balanced an eel on the end of your nose— What made you so awfully clever ? ” “ I have answered three questions, and that is enough, ” Said his father; “ don ’ t give yourself airs!

There's clearly not 9999 tokens. So is there a way to adjust average lengths of output, even by editing source code?

yg211 commented 4 years ago

When you say 'train the whole Alice in Wonderland', I guess you feed the whole book as one long string into the model and ask the model to output a summary of max length 9999? Please note that the system is designed for 'multi-document' summarization, so inputting one document will somehow mislead the system. If you split your input text into a list, with each element in the list a section of the book, and feed the text list into the model, I guess that will yield longer summaries.

ghost commented 4 years ago

I see, thank you. What would be optimal number of docs to produce summary of length ~8000 tokens then? And how number of train rounds may be connected to it?

yg211 commented 4 years ago

Sorry but we have not tested the model to produce that long summaries ... I guess around 50 - 100 input documents would be fine.

yg211 / acl20-ref-free-eval

Summary length #2