huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
129.81k stars 25.79k forks source link

run_mlm.py not utilizing TPU #10192

Closed DarshanDeshpande closed 3 years ago

DarshanDeshpande commented 3 years ago

Environment info

Who can help

@sgugger

Information

Model I am using (Bert, XLNet ...): DistilBert

The problem arises when using:

The tasks I am working on is:

To reproduce

Steps to reproduce the behavior:

!python /content/transformers/examples/xla_spawn.py --num_cores 8 /content/transformers/examples/language-modeling/run_mlm.py \
--model_type distilbert \
--config_name /content/TokenizerFiles \
--tokenizer_name /content/TokenizerFiles \
--train_file Files/file_aa.txt \
--mlm_probability 0.15 \
--output_dir "/content/TrainingCheckpoints" \
--do_train --per_device_train_batch_size 32 \
--save_steps 500 --disable_tqdm False \
--line_by_line True --max_seq_length 150 \
--pad_to_max_length False \
--cache_dir /content/cache_dir \
--save_total_limit 2

My tokenizer and config files are both just {model_type: "distilbert"} and are present in TokenizerFiles folder along with my vocab.txt

The output I get is

WARNING:root:TPU has started up successfully with version pytorch-1.7
2021-02-15 14:40:37.816883: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
WARNING:root:TPU has started up successfully with version pytorch-1.7
2021-02-15 14:40:57.239070: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2021-02-15 14:40:57.283838: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2021-02-15 14:40:57.446951: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2021-02-15 14:40:57.470266: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2021-02-15 14:40:57.473336: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2021-02-15 14:40:57.686903: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2021-02-15 14:40:57.863940: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2021-02-15 14:40:58.555214: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
WARNING:run_mlm:Process rank: -1, device: xla:1, n_gpu: 0distributed training: False, 16-bits training: False
INFO:run_mlm:Training/evaluation parameters TrainingArguments(output_dir=/content/TrainingCheckpoints, overwrite_output_dir=False, do_train=True, do_eval=None, do_predict=False, evaluation_strategy=EvaluationStrategy.NO, prediction_loss_only=False, per_device_train_batch_size=32, per_device_eval_batch_size=8, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_steps=0, logging_dir=runs/Feb15_14-41-21_34a4105ebd5a, logging_first_step=False, logging_steps=500, save_steps=500, save_total_limit=2, no_cuda=False, seed=42, fp16=False, fp16_opt_level=O1, fp16_backend=auto, local_rank=-1, tpu_num_cores=8, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=/content/TrainingCheckpoints, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=False, deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, report_to=['tensorboard'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, _n_gpu=0)
Using custom data configuration default
Downloading and preparing dataset text/default-e939092a7eff14a8 (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab...
02/15/2021 14:41:22 - WARNING - run_mlm -   Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
Dataset text downloaded and prepared to /root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab. Subsequent calls will reuse this data.
[INFO|configuration_utils.py:447] 2021-02-15 14:41:22,465 >> loading configuration file /content/TokenizerFiles/config.json
[INFO|configuration_utils.py:485] 2021-02-15 14:41:22,466 >> Model config DistilBertConfig {
  "activation": "gelu",
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "transformers_version": "4.3.2",
  "vocab_size": 30522
}

[INFO|configuration_utils.py:447] 2021-02-15 14:41:22,467 >> loading configuration file /content/TokenizerFiles/config.json
[INFO|configuration_utils.py:485] 2021-02-15 14:41:22,476 >> Model config DistilBertConfig {
  "activation": "gelu",
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "transformers_version": "4.3.2",
  "vocab_size": 30522
}

[INFO|tokenization_utils_base.py:1688] 2021-02-15 14:41:22,476 >> Model name '/content/TokenizerFiles' not found in model shortcut name list (distilbert-base-uncased, distilbert-base-uncased-distilled-squad, distilbert-base-cased, distilbert-base-cased-distilled-squad, distilbert-base-german-cased, distilbert-base-multilingual-cased). Assuming '/content/TokenizerFiles' is a path, a model identifier, or url to a directory containing tokenizer files.
[INFO|tokenization_utils_base.py:1721] 2021-02-15 14:41:22,477 >> Didn't find file /content/TokenizerFiles/tokenizer.json. We won't load it.
[INFO|tokenization_utils_base.py:1721] 2021-02-15 14:41:22,478 >> Didn't find file /content/TokenizerFiles/added_tokens.json. We won't load it.
[INFO|tokenization_utils_base.py:1721] 2021-02-15 14:41:22,478 >> Didn't find file /content/special_tokens_map.json. We won't load it.
[INFO|tokenization_utils_base.py:1784] 2021-02-15 14:41:22,479 >> loading file /content/TokenizerFiles/vocab.txt
[INFO|tokenization_utils_base.py:1784] 2021-02-15 14:41:22,479 >> loading file None
[INFO|tokenization_utils_base.py:1784] 2021-02-15 14:41:22,480 >> loading file None
[INFO|tokenization_utils_base.py:1784] 2021-02-15 14:41:22,480 >> loading file None
[INFO|tokenization_utils_base.py:1784] 2021-02-15 14:41:22,480 >> loading file /content/TokenizerFiles/tokenizer_config.json
INFO:run_mlm:Training new model from scratch
Using custom data configuration default
Reusing dataset text (/root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab)
02/15/2021 14:41:22 - WARNING - run_mlm -   Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
Using custom data configuration default
Reusing dataset text (/root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab)
02/15/2021 14:41:23 - WARNING - run_mlm -   Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
Using custom data configuration default
Reusing dataset text (/root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab)
02/15/2021 14:41:23 - WARNING - run_mlm -   Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
02/15/2021 14:41:23 - WARNING - run_mlm -   Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
02/15/2021 14:41:23 - WARNING - run_mlm -   Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
Using custom data configuration default
Reusing dataset text (/root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab)
Using custom data configuration default
Reusing dataset text (/root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab)
Using custom data configuration default
Reusing dataset text (/root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab)
02/15/2021 14:41:24 - WARNING - run_mlm -   Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
Using custom data configuration default
Reusing dataset text (/root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab)
100% 2/2 [00:01<00:00,  1.72ba/s]
100% 2/2 [00:01<00:00,  1.65ba/s]
Loading cached processed dataset at /root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab/cache-0028d6bfc2eb6117.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab/cache-0028d6bfc2eb6117.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab/cache-0028d6bfc2eb6117.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab/cache-0028d6bfc2eb6117.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab/cache-0028d6bfc2eb6117.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/text/default-e939092a7eff14a8/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab/cache-0028d6bfc2eb6117.arrow
[INFO|trainer.py:432] 2021-02-15 14:41:59,875 >> The following columns in the training set don't have a corresponding argument in `DistilBertForMaskedLM.forward` and have been ignored: special_tokens_mask.
[INFO|trainer.py:837] 2021-02-15 14:41:59,879 >> ***** Running training *****
[INFO|trainer.py:838] 2021-02-15 14:41:59,879 >>   Num examples = 2000
[INFO|trainer.py:839] 2021-02-15 14:41:59,879 >>   Num Epochs = 3
[INFO|trainer.py:840] 2021-02-15 14:41:59,879 >>   Instantaneous batch size per device = 32
[INFO|trainer.py:841] 2021-02-15 14:41:59,879 >>   Total train batch size (w. parallel, distributed & accumulation) = 256
[INFO|trainer.py:842] 2021-02-15 14:41:59,879 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:843] 2021-02-15 14:41:59,879 >>   Total optimization steps = 24

 17% 4/24 [03:56<17:13, 51.67s/it]  # <------------------- HERE ------------------------>
Traceback (most recent call last):

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

The file used here is only for testing and has a total of 2000 lines of text. It almost seems like the training is taking place on the CPU instead of the TPU. The installation of xla was done using !pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.7-cp36-cp36m-linux_x86_64.whl I ran the same script a couple of days back and it worked fine so I don't know what is wrong now. At that time I had saved the tokenizer using .save() but due to some recent changes in the library, that doesn't work anymore. So I saved it using save_model() and it works fine now. Can this issue be because of that?

Expected behavior

The training should be faster. The last time I ran run_mlm.py, I got almost 3 iterations per second

sgugger commented 3 years ago

--pad_to_max_length False is the reason you have a very slow training: this creates batches of different sequence lengths but TPUs need fixed shapes to be efficient.

There was a bug in our argument parser before that ignored bool setting like this, so it may be the reason you are seeing that slow down now instead of before (but it was applying pad_to_max_length=True before because of that bug, even if you said the opposite). If you remove that option, you should see a faster training.

DarshanDeshpande commented 3 years ago

Perfect! Thank you so much! Closing this issue