salesforce / ctrl-sum

Resources for the "CTRLsum: Towards Generic Controllable Text Summarization" paper
https://arxiv.org/abs/2012.04281
BSD 3-Clause "New" or "Revised" License
146 stars 24 forks source link

What is the input sequence size ? #18

Open desis123 opened 1 year ago

desis123 commented 1 year ago

First of all thanks for this nice summary model. I like to what is input sequence size for this model ? and what is best procedure if I like to summary a long document or multi documents with length more than 6k tokens Thanks in advance.

jxhe commented 1 year ago

Hi, the input sequence size is 1024 at max, normally people truncate the input document if it is too long. If you do want to summarize long documents, you can check works dedicated to long doc summarization like bigbird, long-doc summarization is beyond the scope of ctrlsum