Closed NieShenRuc closed 1 year ago
Hi @NieShenRuc, you can follow BERT/UniLM/RoBERTa to process the text-only data (EN Wikipedia and Bookcorpus), and then split it into several small files. Or you can directly use our stage-2 model which is pretrained on text-only data.
Thanks for your reply!
Model I am using (VLMO), I found that the text-onlt data is loaded from "wikibk.{index}.txt" where index=0,1,...,49,I want to ask I can I get the .txt files?