Closed DarshanDeshpande closed 3 years ago
--pad_to_max_length False
is the reason you have a very slow training: this creates batches of different sequence lengths but TPUs need fixed shapes to be efficient.
There was a bug in our argument parser before that ignored bool setting like this, so it may be the reason you are seeing that slow down now instead of before (but it was applying pad_to_max_length=True
before because of that bug, even if you said the opposite). If you remove that option, you should see a faster training.
Perfect! Thank you so much! Closing this issue
Environment info
transformers
version: 4.3.2 and Latest version forked from githubWho can help
@sgugger
Information
Model I am using (Bert, XLNet ...): DistilBert
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
My tokenizer and config files are both just {model_type: "distilbert"} and are present in TokenizerFiles folder along with my vocab.txt
The output I get is
The file used here is only for testing and has a total of 2000 lines of text. It almost seems like the training is taking place on the CPU instead of the TPU. The installation of xla was done using
!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.7-cp36-cp36m-linux_x86_64.whl
I ran the same script a couple of days back and it worked fine so I don't know what is wrong now. At that time I had saved the tokenizer using.save()
but due to some recent changes in the library, that doesn't work anymore. So I saved it usingsave_model()
and it works fine now. Can this issue be because of that?Expected behavior
The training should be faster. The last time I ran run_mlm.py, I got almost 3 iterations per second