huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.56k stars 26.91k forks source link

Unknown Device when training GPT2 with TPUs in Colab #3914

Closed ncoop57 closed 4 years ago

ncoop57 commented 4 years ago

🐛 Bug

Information

Model I am using (Bert, XLNet ...): DialoGPT2-small from microsoft

Language I am using the model on (English, Chinese ...): Spanish Conversations

The problem arises when using: Pytorch's XLA library for trying to train a GPT2 model on google colab TPUS

The tasks I am working on is:

To reproduce

Steps to reproduce the behavior:

  1. Run the following Colab Notebook: https://colab.research.google.com/drive/1LTH0LpHxWQYEy9U7vBWTk4-4sLo7YF5B
  2. Make sure the Runtime is set to be a TPU
Exception in device=TPU:7: Unknown device
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_gpt2.py", line 484, in forward
    hidden_states, layer_past=layer_past, attention_mask=attention_mask, head_mask=head_mask[i]
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 119, in _start_fn
    fn(gindex, *args)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "<ipython-input-17-4d2a1ccbaa5f>", line 3, in _mp_fn
    a = run(trn_df, val_df, model, tokenizer, args)
  File "<ipython-input-16-d526a8f464d8>", line 81, in run
    scheduler
  File "<ipython-input-11-be3e41b46d25>", line 10, in train_fn
    outputs = model(inputs, labels = labels)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_gpt2.py", line 599, in forward
    inputs_embeds=inputs_embeds,
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_gpt2.py", line 484, in forward
    hidden_states, layer_past=layer_past, attention_mask=attention_mask, head_mask=head_mask[i]
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_gpt2.py", line 231, in forward
    m = self.mlp(self.ln_2(x))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_gpt2.py", line 210, in forward
    h = self.act(self.c_fc(x))
RuntimeError: Unknown device

Expected behavior

GPT2 model to start training leveraging the Google Colab TPU

Environment info

Any help is greatly appreciative and thanks for the amazing library!

ncoop57 commented 4 years ago

Fixed when building from source