Closed mahmoodn closed 1 year ago
Hi, It seems that the readme file for language model is confusing. I followed the steps in dataset.md and everything seems to the right.
git clone https://github.com/sgpyc/training cd language_model/tensorflow/bert/cleanup_scripts source download_and_umcompress.sh git clone https://github.com/attardi/wikiextractor.git cd wikiextractor git checkout 3162bb6c3c9ebd2d15be507aa11d6fa818a454ac cd .. python wikiextractor/WikiExtractor.py wiki/enwiki-20200101-pages-articles-multistream.xml ./process_wiki.sh './text/*/wiki_??'
So, the dataset preparation is done, I think. Now, when I check readme.md, I don't know from where I should continue.
Should I continue from Generate the TFRecords for Wiki dataset?
Hi, It seems that the readme file for language model is confusing. I followed the steps in dataset.md and everything seems to the right.
So, the dataset preparation is done, I think. Now, when I check readme.md, I don't know from where I should continue.
Should I continue from Generate the TFRecords for Wiki dataset?