Closed mengfei25 closed 1 month ago
minify for smaller subgraph https://github.com/intel/torch-xpu-ops/issues/632
We now narrow down to the issue is caused by _adaptive_avg_pool2d_backward op. This op doesn't have deterministic implementation for both xpu and cuda. However the fail_accuracy only happens in xpu device
pass in latest weekly test
🐛 Describe the bug
torchbench_bfloat16_training xpu train squeezenet1_1
E0626 09:48:28.341000 140268361156416 torch/_dynamo/utils.py:1478] RMSE (res-fp64): 0.06469, (ref-fp64): 0.01171 and shape=torch.Size([4, 1000]). res.dtype: torch.bfloat16, multiplier: 3.000000, tol: 0.001000 fail_accuracy
loading model: 0it [00:00, ?it/s]
Loading pipeline components...: 0%| | 0/6 [00:00<?, ?it/s][A
Loading pipeline components...: 33%|███▎ | 2/6 [00:01<00:03, 1.05it/s][A
Loading pipeline components...: 50%|█████ | 3/6 [00:02<00:01, 1.59it/s][A
Loading pipeline components...: 67%|██████▋ | 4/6 [00:02<00:01, 1.53it/s][A Loading pipeline components...: 100%|██████████| 6/6 [00:02<00:00, 2.13it/s]
loading model: 0it [00:07, ?it/s]
Versions
torch-xpu-ops: https://github.com/intel/torch-xpu-ops/commit/31c400195d63064940242220dc9100322d36bac4 pytorch: 0f81473d7b4a1bf09246410712df22541be7caf3 + PRs: 127277,129120 device: PVC 1100, 803.61, 0.5.1