pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
82.15k stars 22.09k forks source link

[inductor][cpu] inductor_max_autotune models accuracy failure in 2024-08-10 nightly release #133465

Open zxd1997066 opened 1 month ago

zxd1997066 commented 1 month ago

🐛 Describe the bug

fp32 static shape default wrapper

suite name thread accuracy perf reason(reference only)
huggingface DebertaV2ForQuestionAnswering multiple X DebertaV2ForQuestionAnswering, fail_accuracy
timm_models jx_nest_base multiple X jx_nest_base, fail_accuracy
timm_models swin_base_patch4_window7_224 multiple X X swin_base_patch4_window7_224, KeyError: m_start
timm_models twins_pcpvt_base multiple X twins_pcpvt_base, fail_accuracy

fp32 dynamic shape default wrapper

suite name thread accuracy perf reason(reference only)
huggingface DebertaV2ForQuestionAnswering multiple X DebertaV2ForQuestionAnswering, fail_accuracy
``` E0814 16:28:00.551836 56121 torch/_dynamo/utils.py:1541] RMSE (res-fp64): nan, (ref-fp64): 0.00000 and shape=torch.Size([8, 1000]). res.dtype: torch.float32, multiplier: 2.000000, tol: 0.001000 fail_accuracy ``` ### Versions

SW info

name target_branch target_commit refer_branch refer_commit
torchbench main 23512dbe main 23512dbe
torch main 6ec4af6865dd884f984c9dbcb273ae26e3825481 main 1d1d074072ecb0aa6ca95e3f43221d2275e16d74
torchvision main 0.19.0a0+d23a6e1 main 0.19.0a0+d23a6e1
torchtext main 0.16.0a0+b0ebddc main 0.16.0a0+b0ebddc
torchaudio main 2.4.0a0+b3f6f51 main 2.4.0a0+69b2a0a
torchdata main 0.7.0a0+11bb5b8 main 0.7.0a0+11bb5b8
dynamo_benchmarks main nightly main fea73cb
Repro: [inductor_single_run.sh](https://github.com/chuanqi129/inductor-tools/blob//weizhuoz/enable_max_autotune_for_guilty/scripts/modelbench/inductor_single_run.sh) bash inductor_single_run.sh multiple inference accuracy **suite** **model** float32 first static/dynamic default 0 inductor_max_autotune Suspected guilty commit: https://github.com/pytorch/pytorch/commit/7911b7bfb770e71a87a007addb6de819ac911c4f [huggingface-DebertaV2ForQuestionAnswering-inference-float32-dynamic-default-multiple-accuracy-crash_guilty_commit.log](https://github.com/user-attachments/files/16615956/huggingface-DebertaV2ForQuestionAnswering-inference-float32-dynamic-default-multiple-accuracy-crash_guilty_commit.log) cc @ezyang @chauhang @penguinwu @WeizhuoZhang-intel @chuanqi129
zxd1997066 commented 1 month ago

convnext_base fp32 statci shape default wrapper shows the same error msg, but I can not reproduce the pass status.

E0814 17:47:54.617337 58277 torch/_dynamo/utils.py:1541] RMSE (res-fp64): nan, (ref-fp64): 0.00000 and shape=torch.Size([8, 1000]). res.dtype: torch.float32, multiplier: 2.000000, tol: 0.001000
fail_accuracy
chunyuan-w commented 3 weeks ago

Most are fixed by https://github.com/pytorch/pytorch/pull/133070 and https://github.com/pytorch/pytorch/pull/133073. One remaining issue is: jx_nest_base

chunyuan-w commented 3 days ago

jx_nest_base will be fixed by https://github.com/pytorch/pytorch/pull/135661