Closed Jongchan closed 2 years ago
Thanks for submitting this issue @Jongchan. I will look into this issue later this week and circle back.
A seemingly relevant resource in the official TPU troubleshooting guide
The total batch size should be a multiple of 64 (8 per TPU core), and feature dimensions should be a multiple of 128, or The total batch size should be a multiple of 1024 (128 per TPU core), and feature dimensions should be a multiple of 8.
Not all layers can conform to this rule, especially the first and last layers of the network. This is fine, and it is expected that most models require some amount of padding.
So, my tentative finding/conclusion is that my model is not well optimized to exploit TPUs. Currently, I am optimizing my model to fit better in TPUs.
Thanks for the updates @Jongchan. Fee free to circle back and open the issue in case you faced additional roadblocks.
❓ Questions and Help
Hello, all! I am running a 3D CNN in TPU v3-8, and the computation seems to be not well optimized.
In short, the majority of the computation time seems to be wasted due to excessive padding in my first convolution.
Background Information
gcr.io/tpu-pytorch/xla:r1.9
)Observation
Below is the screenshot of the TensorBoard profiling result (op_profile page)
Below is the PyTorch definition of the very first convolution:
12x1x96x96x96
(BCTHW) as the input, with 7x7x7 3D convolution, 64 output channels, and 3px paddings.Question
Please excuse my ignorance, as I am just a beginner of using PyTorch-XLA / TPU. Any help or suggestions would be appreciated. Thank you in advance!