huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.27k stars 26.85k forks source link

Index out of range in Bart-large-xsum #7009

Closed tejareddy8888 closed 4 years ago

tejareddy8888 commented 4 years ago

Questions & Help

Hello to everyone!!

I am facing a problem summarizing long articles. I mean very long text with larger vocab size than it is pre-trained already i guess. I see that many of the models have a limitation of maximum input and trying to execute results in error of index out of range. I am particularly using "BART-large-xsum". Please suggest what is the correct way of using these models with long documents shall I finetuning to increase the vocabsize or do anything else.

A code snippet with an example of how to handle long documents with the "BART-large-xsum" would be perfect to start with!

Thanks in advance, Teja

LysandreJik commented 4 years ago

Pinging @sshleifer, the summarization master

sshleifer commented 4 years ago

See https://discuss.huggingface.co/t/summarization-on-long-documents/920/2, and feel free to reply there! I don't have a code snippet but feel free to contribute one to that discussion!