microsoft / AzureML-BERT

End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service
https://azure.microsoft.com/en-us/blog/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale/
MIT License
393 stars 127 forks source link

Can I run this project at my local machine? #28

Closed Pterosaur closed 5 years ago

Pterosaur commented 5 years ago

I have downloaded the training data by following the dataprep and create the Docker environment according to the Dockerfile. But I encounter the following problem when I try to do pretrain:

(amlbert) root@2d453b9b839f:~/pretrain/PyTorch# python train.py  --config_file ../configs/bert-large-single-node.json --path /out
The arguments are: ['train.py', '--config_file', '../configs/bert-large-single-node.json', '--path', '/out']
Traceback (most recent call last):
  File "train.py", line 285, in <module>
    local_rank = get_local_rank()
  File "/root/pretrain/PyTorch/azureml_adapter.py", line 27, in get_local_rank
    return int(os.environ['OMPI_COMM_WORLD_LOCAL_RANK'])
  File "/opt/miniconda/envs/amlbert/lib/python3.6/os.py", line 669, in __getitem__
    raise KeyError(key) from None
KeyError: 'OMPI_COMM_WORLD_LOCAL_RANK'

Did I misunderstand anything?

skaarthik commented 5 years ago

'OMPI_COMM_WORLD_LOCAL_RANK' is an environment variable in AzureML that is used for multi-node pretraining. If you want to run the code locally, use single-node instructions at https://github.com/microsoft/AzureML-BERT/blob/master/pretrain/PyTorch/notebooks/BERT_Pretrain.ipynb. You may still need to make some changes to get it to work in your local machine though.

skaarthik commented 5 years ago

Closing this issue for now. @Pterosaur feel free to reopen it if you need additional info.