microsoft / AzureML-BERT

End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service
https://azure.microsoft.com/en-us/blog/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale/
MIT License
391 stars 127 forks source link

dataloading error #39

Closed xzhu1900 closed 4 years ago

xzhu1900 commented 4 years ago

I run into an attribute error while trying to load the bert training data: AttributeError: Can't get attribute 'WikiNBookCorpusPretrainingDataCreator' on <module '__main__' from 'train.py'>

The data is retrieved from https://bertonazuremlwestus2.blob.core.windows.net/public/bert_data.tar.gz based on the notebook instruction (BERT_Pretrain.ipynb)

It seems this issue is raised because WikiNBookCorpusPretrainingDataCreator class is picked into the data file upon creation but not recognized while loading it, reflecting a mismatch between the loading code and the dataset.

skaarthik commented 4 years ago

We will update the incorrect url in the notebook. In the meantime, use the url from https://github.com/microsoft/AzureML-BERT/blob/master/docs/artifacts.md#preprocessed-data.

xzhu1900 commented 4 years ago

Cool. Thanks!

skaarthik commented 4 years ago

Fixed https://github.com/microsoft/AzureML-BERT/commit/6551791b606cf654367bd99a85cdb67f1c539415