Closed jordane95 closed 1 year ago
Hi,
I want to run the pretraining part in your jupyter notebook, but find that the pretraining data is not prepared...
May I know how you construct the
docs_filter.tsv
file? Thanks.
I'm sorry I could not provide the docs_filter.tsv
used in our experiments. It is collected from the Microsoft News corpus during my internship at Microsoft. Actually, I think you can use other public news corpus instead. For example, you may try to use the news articles provided with the MIND dataset for pre-training.
Hi,
I want to run the pretraining part in your jupyter notebook, but find that the pretraining data is not prepared...
May I know how you construct the
docs_filter.tsv
file? Thanks.