intel / torch-xpu-ops

Apache License 2.0
26 stars 18 forks source link

[E2E] Torchbench amp_bf16 training Super_SloMo accuracy failed #905

Open mengfei25 opened 4 weeks ago

mengfei25 commented 4 weeks ago

🐛 Describe the bug

Looks like there is a random issue for Super_SloMo, and it will be passed with WHL install from prebuild but failed with source build. In latest weekly, WHL Passed: https://github.com/intel/torch-xpu-ops/actions/runs/10742335908 Source build Failed: https://github.com/intel/torch-xpu-ops/actions/runs/10741560513

And I tested WHL locally multiple times and it is passed randomly. image

Versions

torch-xpu-ops: https://github.com/intel/torch-xpu-ops/commit/12065904d4c3c870059d746eb0fb45a0459f1d6d

weishi-deng commented 2 weeks ago

This issue passed in the latest weekly test and local reproducer.