Open wppply opened 4 years ago
When training tensor2tensor on TPU, the actual global batch size is automatically calculated by batch_size * tpu_config.num_shards. Also, the memory usage shown accounts only for a single core out of the 8 replicas. Hence, 'Number of TPU cores: 8' means that you are already effectively using 8 cores, with 8 times the batch size you have specified, and the total global HBM usage is 8.83*8=70.64GB.
As per utilization of 32.1%, it's similar to what I've seen running t2t on TPU. Although it may seem low, you'll see that it's speed is still much faster than GPU. You can use Cloud TPU Tools to grab a deeper, op-by-op profile.
Hi. I am trying to use TPUv2-8 to train a query classifier. However, I got some issues here about memory.
Officially, it claims that TPUv2-8 has 64 GB memory. However, I kept getting this error when I full this tutorial. It cannot handle more than 8GB.
here is the usage of TPU, the usage is quite low as well.
I thought the TPU would spilt the batch equally to each core, but seems not, it is only using single one. When I try to use a single Nvidia T4 to run the same code. There is nothing wrong with it. So, what should I add to the code or the CLI option to leverage FULL 8 TPU CORE instead of a single one right now? Thanks