Open mengfei25 opened 4 months ago
Hi @weishi-deng, I saw you marked this issue as triaged, could you please help to update the status of this issue in comments and project status
From the last triage for this issue, it's caused by the convolution_backward but we're still looking for the fix.
@retonym will submit PR to pytorch to change the tolerance
@weishi-deng dump tensor bn1.bias.grad for review
🐛 Describe the bug
torchbench_bfloat16_training xpu train functorch_dp_cifar10
E0626 09:48:47.557000 140599373223744 torch/_dynamo/utils.py:1478] RMSE (res-fp64): 0.00109, (ref-fp64): 0.00027 and shape=torch.Size([64]). res.dtype: torch.bfloat16, multiplier: 3.000000, tol: 0.001000 E0626 09:48:47.557000 140599373223744 torch/_dynamo/utils.py:1392] Accuracy failed for key name bn1.bias.grad fail_accuracy
loading model: 0it [00:00, ?it/s] loading model: 0it [00:01, ?it/s]
Versions
torch-xpu-ops: https://github.com/intel/torch-xpu-ops/commit/31c400195d63064940242220dc9100322d36bac4 pytorch: 0f81473d7b4a1bf09246410712df22541be7caf3 + PRs: 127277,129120 device: PVC 1100, 803.61, 0.5.1