Hi:
I tried to use your code on my own corpus to do classification which consists of many short sentences.I want to try some expriements with further pre-training without the NSP task.But from your code of "create_pretraining_data.py" ,I found you random choose a doc from the dataset to concatenate to another doc after [SEP] as input which confuse me a lot,could you please explain to me why this is done?Thanks a lot.
Hi: I tried to use your code on my own corpus to do classification which consists of many short sentences.I want to try some expriements with further pre-training without the NSP task.But from your code of "create_pretraining_data.py" ,I found you random choose a doc from the dataset to concatenate to another doc after [SEP] as input which confuse me a lot,could you please explain to me why this is done?Thanks a lot.