pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)
https://pytorch.org/xla
Other
2.47k stars 476 forks source link

Low utilization of 3D convolution #3180

Closed Jongchan closed 2 years ago

Jongchan commented 3 years ago

❓ Questions and Help

Hello, all! I am running a 3D CNN in TPU v3-8, and the computation seems to be not well optimized.

In short, the majority of the computation time seems to be wasted due to excessive padding in my first convolution.

Background Information

Observation

Below is the screenshot of the TensorBoard profiling result (op_profile page)

Below is the PyTorch definition of the very first convolution:

self.in_ch = 64
self.inc = nn.Sequential(nn.Conv3d(n_channels, self.in_ch, 7, padding=3), nn.BatchNorm3d(64), nn.ReLU(inplace=True))

Question

  1. Am I interpreting the result correctly? It seems that there are a lot of room to optimize.
  2. Is there any best-practice for optimizing this low utilization issue?
  3. Is 3D convolution fully optimized in PyTorch-XLA (I assume that 2D conv must have been fully optimized)?

Please excuse my ignorance, as I am just a beginner of using PyTorch-XLA / TPU. Any help or suggestions would be appreciated. Thank you in advance!

miladm commented 3 years ago

Thanks for submitting this issue @Jongchan. I will look into this issue later this week and circle back.

Jongchan commented 3 years ago

A seemingly relevant resource in the official TPU troubleshooting guide

The total batch size should be a multiple of 64 (8 per TPU core), and feature dimensions should be a multiple of 128, or The total batch size should be a multiple of 1024 (128 per TPU core), and feature dimensions should be a multiple of 8.

Not all layers can conform to this rule, especially the first and last layers of the network. This is fine, and it is expected that most models require some amount of padding.

updates

So, my tentative finding/conclusion is that my model is not well optimized to exploit TPUs. Currently, I am optimizing my model to fit better in TPUs.

miladm commented 2 years ago

Thanks for the updates @Jongchan. Fee free to circle back and open the issue in case you faced additional roadblocks.