Closed Derekkk closed 4 years ago
Yes, your understanding is correct. The task of our original paper is only focused on sentence-level. While some summarization tasks (like cnndm) are at the document-level. To handle the longer context (e.g., 512 tokens or more), we adopt multiple spans masked strategy to process long context.
Hi,
Thanks for sharing the code! I have a quick question that in "MASS-summarization/masked_dataset.py", it seems you chose multiple spans from src_items as targets:
and the targets are the concat of all chosen spans. E.g., src = [1,2,3,4,mask,mask,7,8,mask,mask,11,12] and tgt = [5,6,9,10]
I want confirm if my understanding is correct since in the original paper you only chose on segment for each input. Thanks a lot!