TPU HBM OOM - Githubissues

Hi. I am trying to use TPUv2-8 to train a query classifier. However, I got some issues here about memory.

Officially, it claims that TPUv2-8 has 64 GB memory. However, I kept getting this error when I full this tutorial. It cannot handle more than 8GB.

INFO:tensorflow:Error recorded from training_loop: Compilation failure: Ran out of memory in memory space hbm. Used 8.83G of 8.00G hbm. Exceeded hbm capacity by 848.88M.

Total hbm usage >= 8.83G:
    reserved        528.00M
    program           8.25G
    arguments        64.32M (99.9% utilization)

Output size 64.32M (99.9% utilization); shares 64.25M with arguments.

Program hbm requirement 8.25G:
    reserved           4.0K
    global            65.0K
    HLO temp          8.25G (100.0% utilization, 0.0% fragmentation (1.01M))

  Largest program allocations in hbm:

  1. Size: 4.00G
     Operator: op_name="XLA_Args"
     Shape: bf16[256,2048,4096]{2,1,0}
     Unpadded size: 4.00G
     XLA label: %arg_tuple.1996.1402 = (s32[], s32[], f32[], f32[4,1024]{1,0}, bf16[4,1024]{1,0}, f32[4,1024]{1,0}, bf16[4,1024]{1,0}, s32[4]{0}, s32[], s32[], f32[4,1024]{1,0}, f32[], bf16[], bf16[], s32[], bf16[2048,4096]{1,0}, bf16[4096]{0}, bf16[2048,4096]{1,0}, bf16[...
     Allocation type: HLO temp
...

here is the usage of TPU, the usage is quite low as well.

 b'  TPU type: TPU v2\n  Number of TPU cores: 8 (Replica count = 8, num cores per replica = 1)\n  TPU idle time (lower is better): 0.009%\n  Utilization of TPU Matrix Units (higher is better): 32.1%\n  Step time: 58.6ms (avg), 58.4ms (min), 58.9ms (max)\n  Infeed percentage: 0.010% (avg), 0.009% (min), 0.010% (max)\n\n'

I thought the TPU would spilt the batch equally to each core, but seems not, it is only using single one. When I try to use a single Nvidia T4 to run the same code. There is nothing wrong with it. So, what should I add to the code or the CLI option to leverage FULL 8 TPU CORE instead of a single one right now? Thanks

tensorflow / tensor2tensor

TPU HBM OOM #1807