Closed richardbaihe closed 4 years ago
This paper focuses on the model pre-training for discourse tasks.
Different with NSP/SOP, this paper proposes a new sub-task: predicting whether the distance between two sentences are within K window length.
The best K according to its experiments is 2.
Experimental results show that their model achieves SOTA on various Discourse Tasks.