mlcommons / training_results_v1.1

This repository contains the results and code for the MLPerf™ Training v1.1 benchmark.
https://mlcommons.org/en/training-normal-11/
Apache License 2.0
23 stars 20 forks source link

[NVIDIA/benchmarks/bert/implementations/pytorch] prepare_data.sh fails - issue with BertConfig when `convert_tf_checkpoint.py` is called #6

Open nickfraser opened 1 year ago

nickfraser commented 1 year ago

The prepare_data.sh script fails, producing the following error:

Traceback (most recent call last):                                                                                                                                                                                                             
  File "/workspace/bert/input_preprocessing/../convert_tf_checkpoint.py", line 86, in <module>                                                                                                                                                 
    main()                                                                                                                                                                                                                                     
  File "/workspace/bert/input_preprocessing/../convert_tf_checkpoint.py", line 80, in main                                                                                                                                                     
    model = prepare_model(args, device)                                                                                                                                                                                                        
  File "/workspace/bert/input_preprocessing/../convert_tf_checkpoint.py", line 72, in prepare_model                                                                                                                                            
    model = BertForPretraining.from_pretrained(args.tf_checkpoint, from_tf=True, config=config)                                                                                                                                                
  File "/workspace/bert/modeling.py", line 867, in from_pretrained                                                                                                                                                                             
    model = cls(config, *inputs, **kwargs)                                                                                                                                                                                                     
  File "/workspace/bert/modeling.py", line 1060, in __init__                                                                                                                                                                                   
    self.cls = BertPreTrainingHeads(config, self.bert.embeddings.word_embeddings.weight)                                                                                                                                                       
  File "/workspace/bert/modeling.py", line 791, in __init__                                                                                                                                                                                    
    self.predictions = BertLMPredictionHead(config, bert_model_embedding_weights)                                                                                                                                                              
  File "/workspace/bert/modeling.py", line 744, in __init__                                                                                                                                                                                    
    self.fused_fc = config.fused_bias_fc_loss_head                                                                                                                                                                                             
AttributeError: 'BertConfig' object has no attribute 'fused_bias_fc_loss_head'

It appears that either the convert_tf_checkpoint.py is incorrectly referencing this dictionary entry, or, the downloaded bert_config.json is missing a key/value pair (specifically, the fused_bias_fc_loss_head key/value).

Steps to Reproduce

Clone the repo, browse to NVIDIA/benchmarks/bert/implementations/pytorch and run the following:

docker build --pull -t nickfraser/mlperf-nvidia:language_model .
docker --rm -it --runtime=nvidia --ipc=host -v /<location on host>/bert_data/:/workspace/bert_data nickfraser/mlperf-nvidia:language_model
./input_preprocessing/prepare_data.sh --outputdir /workspace/bert_data

Which eventually leads to the error in the last command of the prepare_data.sh script. Note, md5sum of bert_config.json, vocab.txt, model.ckpt-28252.data-00000-of-00001, model.ckpt-28252.index, model.ckpt-28252.meta match the expected values. Also, I added set -e at the top of the prepare_data.sh script to ensure no other errors occurred on prior commands.

Since bert_config.json matches the expected md5sum, I expect that the issue is with the convert_tf_checkpoint.py script. Any help that can be provided is much appreciated.

nickfraser commented 1 year ago

Actually, it seems more likely that there is confusion between the variables fused_bias_fc_loss_head here and fused_bias_fc here. However, since there is no git history, I'm not able to see if this was renamed at some point.

Perhaps the original authors can provide some insights?