🐛 Bug

Information

Model I am using (Bert, XLNet ...): DialoGPT2-small from microsoft

Language I am using the model on (English, Chinese ...): Spanish Conversations

The problem arises when using: Pytorch's XLA library for trying to train a GPT2 model on google colab TPUS

[ ] the official example scripts: (give details below)
[ x ] my own modified scripts: (give details below) I adapted a TPU training script for the RoBERTa model to attempt to work with a GPT2 model https://colab.research.google.com/drive/1LTH0LpHxWQYEy9U7vBWTk4-4sLo7YF5B

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[ x ] my own task or dataset: Multi Turn Dialog

To reproduce

Steps to reproduce the behavior:

Run the following Colab Notebook: https://colab.research.google.com/drive/1LTH0LpHxWQYEy9U7vBWTk4-4sLo7YF5B
Make sure the Runtime is set to be a TPU

Exception in device=TPU:7: Unknown device
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_gpt2.py", line 484, in forward
    hidden_states, layer_past=layer_past, attention_mask=attention_mask, head_mask=head_mask[i]
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 119, in _start_fn
    fn(gindex, *args)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "<ipython-input-17-4d2a1ccbaa5f>", line 3, in _mp_fn
    a = run(trn_df, val_df, model, tokenizer, args)
  File "<ipython-input-16-d526a8f464d8>", line 81, in run
    scheduler
  File "<ipython-input-11-be3e41b46d25>", line 10, in train_fn
    outputs = model(inputs, labels = labels)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_gpt2.py", line 599, in forward
    inputs_embeds=inputs_embeds,
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_gpt2.py", line 484, in forward
    hidden_states, layer_past=layer_past, attention_mask=attention_mask, head_mask=head_mask[i]
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_gpt2.py", line 231, in forward
    m = self.mlp(self.ln_2(x))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_gpt2.py", line 210, in forward
    h = self.act(self.c_fc(x))
RuntimeError: Unknown device

Expected behavior

GPT2 model to start training leveraging the Google Colab TPU

Environment info

transformers version: 2.8.0
Platform: Linux-4.19.104+-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.6.9
PyTorch version (GPU?): 1.5.0a0+d6149a7 (False)
Tensorflow version (GPU?): 2.2.0-rc3 (False)
Using GPU in script?: No
Using distributed or parallel set-up in script?: Yes
Pytorch XLA Version: torch-xla-1.6+e788e5b

Any help is greatly appreciative and thanks for the amazing library!

huggingface / transformers

Unknown Device when training GPT2 with TPUs in Colab #3914

🐛 Bug

Information

To reproduce

Expected behavior

Environment info