open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.37k stars 754 forks source link

Tpu support #277

Open kbrajwani opened 3 years ago

kbrajwani commented 3 years ago

Do we have tpu support for training and inference models?

innerlee commented 3 years ago

Haven't tried that yet. Can TPU run normal pytorch training?

kbrajwani commented 3 years ago

I think it's different on every model. But the answer is yes that normal PyTorch training can also run on tpu with some modification in training loop. We can check the difference here https://www.kaggle.com/tanulsingh077/pytorch-xla-understanding-tpu-s-and-xla . This is old method. The new way is pytorchlightning which is automatically identify device and run same code on gpu & TPU.

kbrajwani commented 3 years ago

Hi @innerlee , you have any planned to work on tpu because it will help too much to train model on large dataset? Thanks

innerlee commented 3 years ago

For TPU support, its better to build the infrastructure for the whole mm-eocsystem, and mmocr's support will be ready automatically. cc. @hellock, do you have plan for this?

BTW,

to train model on large dataset

how large is this? We use GPU and train on moderately large data.

kbrajwani commented 3 years ago

I am training on 10000 train, 1000 val, 1000 test images on psenet model for detection part for 600 epochs. Currently i am using colab ecosystem where it shows me around 23 days if I am using gpu. So it would be better if we can support the tpu then it will be too quick. Thanks

innerlee commented 3 years ago

It is relatively small. So you may double check the colab env

kbrajwani commented 3 years ago

Colab env is giving t4 GPU with 16 Gb ram and it gives tpu v2 . So tpu are much faster then gpu that's the reason i want to make use of that. Thanks