huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.29k stars 26.35k forks source link

FileNotFoundError when running run_squad.py #1921

Closed maxmatical closed 4 years ago

maxmatical commented 4 years ago

❓ Questions & Help

I tried fine-tuning BERT on squad on my local computer. The script I ran was

python3 ./examples/run_squad.py \
    --model_type bert \
    --model_name_or_path bert-large-uncased-whole-word-masking \
    --do_train \
    --do_eval \
    --do_lower_case \
    --train_file $SQUAD_DIR/train-v1.1.json \
    --predict_file $SQUAD_DIR/dev-v1.1.json \
    --learning_rate 3e-5 \
    --num_train_epochs 2 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --output_dir ../models/wwm_uncased_finetuned_squad/ \
    --per_gpu_eval_batch_size=3   \
    --per_gpu_train_batch_size=3   \

But I get an error with regards to the train-v1.1.json not being found. The full output is

I1122 20:03:40.218862 4637015488 tokenization_utils.py:375] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-whole-word-masking-vocab.txt from cache at /Users/maxtian/.cache/torch/transformers/b3a6b2c6d7ea2ffa06d0e7577c1e88b94fad470ae0f060a4ffef3fe0bdf86730.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
I1122 20:03:40.596048 4637015488 modeling_utils.py:383] loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-whole-word-masking-pytorch_model.bin from cache at /Users/maxtian/.cache/torch/transformers/66cc7a7501e3499efedc37e47b3a613e0d3d8d0a51c66224c69f0c669b52dcfb.ae11cc7f2a26b857b76b404a908c7abad793f88bf8ad95caecff154da87994b1
I1122 20:03:54.460903 4637015488 modeling_utils.py:453] Weights of BertForQuestionAnswering not initialized from pretrained model: ['qa_outputs.weight', 'qa_outputs.bias']
I1122 20:03:54.461247 4637015488 modeling_utils.py:456] Weights from pretrained model not used in BertForQuestionAnswering: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
I1122 20:03:54.473404 4637015488 run_squad.py:504] Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', device=device(type='cpu'), do_eval=True, do_lower_case=True, do_train=True, doc_stride=128, eval_all_checkpoints=False, evaluate_during_training=False, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=3e-05, local_rank=-1, logging_steps=50, max_answer_length=30, max_grad_norm=1.0, max_query_length=64, max_seq_length=384, max_steps=-1, model_name_or_path='bert-large-uncased-whole-word-masking', model_type='bert', n_best_size=20, n_gpu=0, no_cuda=False, null_score_diff_threshold=0.0, num_train_epochs=2.0, output_dir='../models/wwm_uncased_finetuned_squad/', overwrite_cache=False, overwrite_output_dir=False, per_gpu_eval_batch_size=3, per_gpu_train_batch_size=3, predict_file='/dev-v1.1.json', save_steps=50, seed=42, server_ip='', server_port='', tokenizer_name='', train_file='/train-v1.1.json', verbose_logging=False, version_2_with_negative=False, warmup_steps=0, weight_decay=0.0)
I1122 20:03:54.474577 4637015488 run_squad.py:308] Creating features from dataset file at /train-v1.1.json

And I get the following error

Traceback (most recent call last):
  File "./examples/run_squad.py", line 573, in <module>
    main()
  File "./examples/run_squad.py", line 518, in main
    train_dataset = load_and_cache_examples(args, tokenizer, evaluate=False, output_examples=False)
  File "./examples/run_squad.py", line 311, in load_and_cache_examples
    version_2_with_negative=args.version_2_with_negative)
  File "/Users/maxtian/Desktop/Python_Projects/transformers/examples/utils_squad.py", line 114, in read_squad_examples
    with open(input_file, "r", encoding='utf-8') as reader:
FileNotFoundError: [Errno 2] No such file or directory: '/train-v1.1.json'
pohanchi commented 4 years ago

You need to download that json on squad website and put to your local directory zzz

On Sat, Nov 23, 2019 at 09:09 Max Tian notifications@github.com wrote:

❓ Questions & Help

I tried fine-tuning BERT on squad on my local computer. The script I ran was

python3 ./examples/run_squad.py \

--model_type bert \

--model_name_or_path bert-large-uncased-whole-word-masking \

--do_train \

--do_eval \

--do_lower_case \

--train_file $SQUAD_DIR/train-v1.1.json \

--predict_file $SQUAD_DIR/dev-v1.1.json \

--learning_rate 3e-5 \

--num_train_epochs 2 \

--max_seq_length 384 \

--doc_stride 128 \

--output_dir ../models/wwm_uncased_finetuned_squad/ \

--per_gpu_eval_batch_size=3   \

--per_gpu_train_batch_size=3   \

But I get an error with regards to the train-v1.1.json not being found. The full output is

I1122 20:03:40.218862 4637015488 tokenization_utils.py:375] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-whole-word-masking-vocab.txt from cache at /Users/maxtian/.cache/torch/transformers/b3a6b2c6d7ea2ffa06d0e7577c1e88b94fad470ae0f060a4ffef3fe0bdf86730.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084

I1122 20:03:40.596048 4637015488 modeling_utils.py:383] loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-whole-word-masking-pytorch_model.bin from cache at /Users/maxtian/.cache/torch/transformers/66cc7a7501e3499efedc37e47b3a613e0d3d8d0a51c66224c69f0c669b52dcfb.ae11cc7f2a26b857b76b404a908c7abad793f88bf8ad95caecff154da87994b1

I1122 20:03:54.460903 4637015488 modeling_utils.py:453] Weights of BertForQuestionAnswering not initialized from pretrained model: ['qa_outputs.weight', 'qa_outputs.bias']

I1122 20:03:54.461247 4637015488 modeling_utils.py:456] Weights from pretrained model not used in BertForQuestionAnswering: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']

I1122 20:03:54.473404 4637015488 run_squad.py:504] Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', device=device(type='cpu'), do_eval=True, do_lower_case=True, do_train=True, doc_stride=128, eval_all_checkpoints=False, evaluate_during_training=False, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=3e-05, local_rank=-1, logging_steps=50, max_answer_length=30, max_grad_norm=1.0, max_query_length=64, max_seq_length=384, max_steps=-1, model_name_or_path='bert-large-uncased-whole-word-masking', model_type='bert', n_best_size=20, n_gpu=0, no_cuda=False, null_score_diff_threshold=0.0, num_train_epochs=2.0, output_dir='../models/wwm_uncased_finetuned_squad/', overwrite_cache=False, overwrite_output_dir=False, per_gpu_eval_batch_size=3, per_gpu_train_batch_size=3, predict_file='/dev-v1.1.json', save_steps=50, seed=42, server_ip='', server_port='', tokenizer_name='', train_file='/train-v1.1.json', verbose_logging=False, version_2_with_negative=False, warmup_steps=0, weight_decay=0.0)

I1122 20:03:54.474577 4637015488 run_squad.py:308] Creating features from dataset file at /train-v1.1.json

And I get the following error

Traceback (most recent call last):

File "./examples/run_squad.py", line 573, in

main()

File "./examples/run_squad.py", line 518, in main

train_dataset = load_and_cache_examples(args, tokenizer, evaluate=False, output_examples=False)

File "./examples/run_squad.py", line 311, in load_and_cache_examples

version_2_with_negative=args.version_2_with_negative)

File "/Users/maxtian/Desktop/Python_Projects/transformers/examples/utils_squad.py", line 114, in read_squad_examples

with open(input_file, "r", encoding='utf-8') as reader:

FileNotFoundError: [Errno 2] No such file or directory: '/train-v1.1.json'

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/huggingface/transformers/issues/1921?email_source=notifications&email_token=AIEAE4HBKLYKDQWTFUTKO3TQVB7EZA5CNFSM4JQXY37KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H3QZTPA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4ANIT2HJAZA5JSGH2LQVB7EZANCNFSM4JQXY37A .

maxmatical commented 4 years ago

oh my mistake. i thought the json files are already in the repo