mgrankin / ru_transformers

Apache License 2.0
776 stars 108 forks source link

Running on google colab TPU #20

Closed nikhilno1 closed 4 years ago

nikhilno1 commented 4 years ago

I was trying to get the TPU training run on Google Colab TPU. There is this TPU MNIST Demo that works fine with xla. (https://colab.sandbox.google.com/github/pytorch/xla/blob/master/contrib/colab/mnist-training-xrt-1-15.ipynb) so I thought I should be able to run your test_train_mp_mnist.py. But running into issues. Even the tpu_lm_finetuning.py failed to run with some thread join error. Do you have any plans to make your code run on Google Colab TPU? That would be very helpful.

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 119, in _start_fn
    fn(gindex, *args)
  File "/content/ru_transformers/tpu/test_train_mp_mnist.py", line 180, in _mp_fn
    accuracy = train_mnist()
  File "/content/ru_transformers/tpu/test_train_mp_mnist.py", line 79, in train_mnist
    transforms.Normalize((0.1307,), (0.3081,))]))
  File "/usr/local/lib/python3.6/dist-packages/torchvision/datasets/mnist.py", line 71, in __init__
    self.download()
  File "/usr/local/lib/python3.6/dist-packages/torchvision/datasets/mnist.py", line 144, in download
    read_image_file(os.path.join(self.raw_folder, 'train-images-idx3-ubyte')),
  File "/usr/local/lib/python3.6/dist-packages/torchvision/datasets/mnist.py", line 483, in read_image_file
    x = read_sn3_pascalvincent_tensor(f, strict=False)
  File "/usr/local/lib/python3.6/dist-packages/torchvision/datasets/mnist.py", line 461, in read_sn3_pascalvincent_tensor
    magic = get_int(data[0:4])
  File "/usr/local/lib/python3.6/dist-packages/torchvision/datasets/mnist.py", line 426, in get_int
    return int(codecs.encode(b, 'hex'), 16)
ValueError: invalid literal for int() with base 16: b''
mgrankin commented 4 years ago

It would be great to have the code working on Colab. Please, share if you make it work.

nikhilno1 commented 4 years ago

Will give it a shot.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

leejason commented 4 years ago

+1 --> It would be great to have the code working on Colab.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.