Segmentation fault (core dumped) running run_qa.py

piecurus commented 3 years ago

Environment info

transformers version: 4.0.1
Platform: Linux-5.4.0-58-generic-x86_64-with-glibc2.10
Python version: 3.8.5
PyTorch version (GPU?): 1.7.1+cu110 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help

Information

Model I am using (Bert, XLNet ...): distilbert-base-uncased (but other bert variants do the same)

The problem arises when using:

[ X] the official example scripts: (give details below)
[ ] my own modified scripts: (give details below)

The tasks I am working on is:

[ x] an official GLUE/SQUaD task: (give the name)
[ ] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

1.mkdir squad wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json -O squad/train-v2.0.json wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json -O squad/dev

python run_qa.py \ --model_name_or_path distilbert-base-uncased \ --do_train \ --train_file ./squad/train-v2.0.json \ --per_device_train_batch_size 2 \ --learning_rate 3e-5 \ --num_train_epochs 2 \ --max_seq_length 384 \ --doc_stride 128 \ --output_dir ./models/ \ --overwrite_output_dir

12/12/2020 21:22:50 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False 12/12/2020 21:22:50 - INFO - main - Training/evaluation parameters TrainingArguments(output_dir='./models/', overwrite_output_dir=True, do_train=True, do_eval=False, do_predict=False, evaluation_strategy=<EvaluationStrategy.NO: 'no'>, prediction_loss_only=False, per_device_train_batch_size=2, per_device_eval_batch_size=8, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=3e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=2.0, max_steps=-1, warmup_steps=0, logging_dir='runs/Dec12_21-22-50_piero-laptop', logging_first_step=False, logging_steps=500, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name='./models/', disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None) Using custom data configuration default Downloading and preparing dataset json/default-0b904584a9578d6f (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /home/piero/.cache/huggingface/datasets/json/default-0b904584a9578d6f/0.0.0/70d89ed4db1394f028c651589fcab6d6b28dddcabbe39d3b21b4d41f9a708514... 0 tables [00:00, ? tables/s]Segmentation fault (core dumped)

Note:

I would like to test the script on the downloaded SQUAD dataset to apply the script after to my own dataset. If I run as below, everything works fine python run_qa.py \ --model_name_or_path bert-base-uncased \ --dataset_name squad \ --do_train \ --do_eval \ --per_device_train_batch_size 4 \ --learning_rate 3e-5 \ --num_train_epochs 2 \ --max_seq_length 384 \ --doc_stride 128 \ --output_dir ./models \ --overwrite_output_dir

salrowili commented 3 years ago

same problem here? what is going on because I run glue smoothly.it seems that the problem related to the script itself.

sgugger commented 3 years ago

The problem is in your JSON file. The squad v2 JSON file is not in a format the datasets library can directly preprocess, so you need to make it compliant with it. You should take this issue to the datasets library and explain what your needs is.

You can also check the mock data file used in the tests to see the expected format. A datasets expert would know better than me but I think the problem is that the squad JSON file has lists of dicts for the "answers" field when datasets expects a dictionary keys to list.

salrowili commented 3 years ago

The problem is not the JSON file that I have and I was able to solve it by using Transformers 3.x with no issues.

daniel347x commented 3 years ago

Transformers v4 does not support training on SQuAD v2 via its example training script. For now, you have to use Transformers v3.

sgugger commented 3 years ago

Yes you could run it with the older script which was parsing the JSON differently. The new version uses the datasets library and requires the JSON to be organized differently (for compatibility with Arrow).

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers