in the README you briefly mention that the model can be trained on the CNN/DailyMail corpus. This corpus contains many documents with more than 512 tokens, which is the limit for BERT afaik. Even while training apparently succeeds testing the new model fails:
Please refer to the paper, here it is a reduced version of CNN/DM which called CNN/DM-R. I'll revise the description on the readme file for better clarification.
Hi,
in the README you briefly mention that the model can be trained on the CNN/DailyMail corpus. This corpus contains many documents with more than 512 tokens, which is the limit for BERT afaik. Even while training apparently succeeds testing the new model fails: