Open nigaregr opened 5 years ago
I think you should create another subfolder in bert_data/validation_512_only
with the validation data (i.e .bin files generated by create_pretraining
) in it
Thanks @nigaregr for reporting this. @jingyanwangms can you update the tar file mentioned in https://github.com/microsoft/AzureML-BERT/blob/master/docs/artifacts.md#preprocessed-data with the newly generated wikipedia dataset and the validation folder?
For now I created bert_data/validation_512_only
folder and moved wikipedia_segmented_part_98.bin
and it seems the training pipeline is working fine.
Still would be great to use the updated files @jingyanwangms
Hi @skaarthik, have you decided to update the zip-dataset or the data prep instruction? Besides, I wonder what if I did as @usuyama suggested? Will there be any performance influence/drop? Thanks!
Hi @Howal, what @usuyama did is a reasonable workaround in the absence of some other validation set.
Hi, I have Pretraining running but it fails after 1st Epoch with the following error: File "/AzureML-BERT/pretrain/PyTorch/dataset.py", line 100, in init path = get_random_partition(self.dir_path, index) File "/AzureML-BERT/pretrain/PyTorch/dataset.py", line 33, in get_random_partition for x in os.listdir(data_directory)] FileNotFoundError: [Errno 2] No such file or directory: 'bert_data/validation_512_only'
I have the created the Wiki pretraining data using create_pretraining script. I do not see validation_512_only being generated?