intel / torch-xpu-ops

Apache License 2.0
30 stars 21 forks source link

[E2E Accuracy] timm jx_nest_base amp_fp16 inference accuracy failed randomly #979

Open mengfei25 opened 1 month ago

mengfei25 commented 1 month ago

🐛 Describe the bug

Details in https://github.com/intel/torch-xpu-ops/actions/runs/11361002852

dev name batch_size accuracy
xpu jx_nest_base 8 fail_accuracy
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 pass
xpu jx_nest_base 8 fail_accuracy

Versions

env: pytorch: bdb42e7c944eb8c3bbfa0327e49e5db797a0bd92 torch-xpu-ops: 1d217ae491669b550b136ca16e91b85c4597cd66 keep_torch_xpu_ops: false python: 3.10 TRITON_COMMIT_ID: 91b14bf5593cf58a8541f3e6b9125600a867d4ef TORCH_COMMIT_ID: bdb42e7c944eb8c3bbfa0327e49e5db797a0bd92 TRANSFORMERS_VERSION: 243e186efbf7fb93328dd6b34927a4e8c8f24395 DRIVER_VERSION: 803.61 KERNEL_VERSION: 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 BUNDLE_VERSION: 0.5.3 OS_PRETTY_NAME: Ubuntu 22.04.2 LTS GCC_VERSION: 11

retonym commented 4 days ago

could not reproduce the random issue locally. This model passed in last weekly test. Will check the condition in next weekly test.

retonym commented 3 days ago

amp_fp16 inference is not meta dashboard targeted datatype, move to milestone: PT2.7