intel / torch-xpu-ops

Apache License 2.0
23 stars 15 forks source link

New accuracy failures compared with 0617 baseline #603

Closed mengfei25 closed 4 weeks ago

mengfei25 commented 2 months ago

🐛 Describe the bug

Timm_models: https://github.com/intel/torch-xpu-ops/actions/runs/9888963166/job/27314002866 Torchbench: https://github.com/intel/torch-xpu-ops/actions/runs/9870142412/job/27300068679

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

suite | category | name | new | reference | result -- | -- | -- | -- | -- | -- timm_models | timm_models_float32_training | cspdarknet53 | fail_accuracy | pass | new_fail timm_models | timm_models_amp_fp16_training | eca_halonext26ts | fail_to_run | pass | new_fail timm_models | timm_models_float32_training | eca_halonext26ts | fail_to_run | pass | new_fail timm_models | timm_models_amp_bf16_training | eca_halonext26ts | fail_to_run | pass | new_fail timm_models | timm_models_bfloat16_training | eca_halonext26ts | fail_to_run | pass | new_fail timm_models | timm_models_float32_training | gluon_inception_v3 | fail_accuracy | pass | new_fail timm_models | timm_models_amp_fp16_training | jx_nest_base | fail_accuracy | pass | new_fail timm_models | timm_models_float16_training | jx_nest_base | fail_accuracy | pass | new_fail timm_models | timm_models_float32_training | jx_nest_base | fail_accuracy | pass | new_fail timm_models | timm_models_amp_bf16_training | jx_nest_base | fail_accuracy | pass | new_fail timm_models | timm_models_bfloat16_training | jx_nest_base | fail_accuracy | pass | new_fail timm_models | timm_models_amp_fp16_training | lcnet_050 | fail_accuracy | pass | new_fail timm_models | timm_models_float32_training | lcnet_050 | fail_accuracy | pass | new_fail timm_models | timm_models_amp_bf16_training | lcnet_050 | fail_accuracy | pass | new_fail timm_models | timm_models_bfloat16_training | lcnet_050 | fail_accuracy | pass | new_fail timm_models | timm_models_amp_fp16_training | mobilenetv2_100 | fail_accuracy | pass | new_fail timm_models | timm_models_float32_training | mobilenetv2_100 | fail_accuracy | pass | new_fail timm_models | timm_models_amp_fp16_training | poolformer_m36 | fail_accuracy | pass | new_fail timm_models | timm_models_float32_training | poolformer_m36 | fail_accuracy | pass | new_fail timm_models | timm_models_amp_bf16_training | poolformer_m36 | fail_accuracy | pass | new_fail timm_models | timm_models_bfloat16_training | poolformer_m36 | fail_accuracy | pass | new_fail torchbench | torchbench_amp_bf16_inference | detectron2_fcos_r_50_fpn | fail_to_run | pass | new_fail torchbench | torchbench_amp_fp16_inference | detectron2_fcos_r_50_fpn | fail_to_run | pass | new_fail torchbench | torchbench_bfloat16_inference | detectron2_fcos_r_50_fpn | fail_to_run | pass | new_fail torchbench | torchbench_float16_inference | detectron2_fcos_r_50_fpn | fail_to_run | pass | new_fail torchbench | torchbench_float32_inference | detectron2_fcos_r_50_fpn | fail_to_run | pass | new_fail torchbench | torchbench_amp_fp16_training | squeezenet1_1 | fail_accuracy | pass | new_fail torchbench | torchbench_float32_training | squeezenet1_1 | fail_accuracy | pass | new_fail torchbench | torchbench_float32_training | timm_efficientnet | fail_accuracy | pass | new_fail torchbench | torchbench_bfloat16_training | timm_regnet | fail_accuracy | pass | new_fail torchbench | torchbench_float32_inference | vision_maskrcnn | fail_to_run | pass | new_fail torchbench | torchbench_float32_training | vision_maskrcnn | eager_fail_to_run | pass | new_fail

Versions

PVC 1100, Driver 803.61, Bundle 0.5.1 New baseline pytorch: https://github.com/pytorch/pytorch/commit/32e74ed torch-xpu-ops: https://github.com/intel/torch-xpu-ops/commit/1dcaf3e

Last baseline pytorch: https://github.com/pytorch/pytorch/commit/0f81473d7b4a1bf09246410712df22541be7caf3 torch-xpu-ops: 31c400195d63064940242220dc9100322d36bac4

retonym commented 1 month ago

jx_nest_base, lcnet_050 and poolformer_m36 could pass, with _adaptive_avg_pool2d_backward fallback to cpu

chuanqi129 commented 4 weeks ago

@retonym double check whether this issue fixed or not

retonym commented 4 weeks ago

These models verified pass locally.