huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.73k stars 26.94k forks source link

BERT Fine-tuning problems #4438

Closed laetokang closed 4 years ago

laetokang commented 4 years ago

❓ Questions & Help

Details

Hello. I'm going to do a fine-tuning of BERT-base-uncased using the QA Dataset I made. However, the following error occurs: Could you tell me how to solve this problem?

A link to original question on Stack Overflow:

Traceback (most recent call last):
  File "./examples/question-answering/run_squad.py", line 830, in <module>
    main()
  File "./examples/question-answering/run_squad.py", line 768, in main
    train_dataset = load_and_cache_examples(args, tokenizer, evaluate=False, output_examples=False)
  File "./examples/question-answering/run_squad.py", line 452, in load_and_cache_examples
    examples = processor.get_train_examples(args.data_dir, filename=args.train_file)
  File "/home/address/anaconda3/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 525, in get_train_examples
    return self._create_examples(input_data, "train")
  File "/home/address/anaconda3/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 552, in _create_examples
    title = entry["title"]
TypeError: string indices must be integers
Traceback (most recent call last):
  File "/home/address/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/address/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/address/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 253, in <module>
    main()
  File "/home/address/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 249, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/address/anaconda3/bin/python', '-u', './examples/question-answering/run_squad.py', '--local_rank=1', '--model_type', 'bert', '--model_name_or_path', 'bert-base-uncased', '--do_train', '--do_eval', '--train_file', '/home/address/Desktop/address/train_split.json', '--predict_file', '/home/address/Desktop/address/val_split.json', '--learning_rate', '3e-5', '--num_train_epochs', '2', '--max_seq_length', '384', '--doc_stride', '128', '--output_dir', '../models/wwm_uncased_finetuned_squad/', '--per_gpu_eval_batch_size=3', '--per_gpu_train_batch_size=3']' returned non-zero exit status 1.
LysandreJik commented 4 years ago

Is your dataset following the SQuAD dataset format? It seems that what's making it crash is that there's no title entry.

You can take a look at how SQuAD is setup here.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.