Closed ESI-SYD closed 6 months ago
@whitneywhtsang did your latest float32
run for TIFF show these problems ?
The following failures are fixed with setup:
Stonepia/pytorch dev/triton-test-3.0 0f6d72ce16bd4b30402dcad97144d17cd7bc53ed
intel/intel-xpu-backend-for-triton b6d3678483dbffa58f0470a46c0b512f223aabda
intel-extension-for-pytorch 2.1.10+git99b4297
torch 2.1.0a0+git59f7c41
torchvision 0.16.0a0+47cd5ea
timm 0.9.15.dev0
=======================Failed models in amp_bf16=============================
training:
name accuracy
botnet26t_256 fail_accuracy
coat_lite_mini fail_accuracy
convmixer_768_32 fail_accuracy
cspdarknet53 fail_accuracy
fbnetv3_b fail_accuracy
gmixer_24_224 fail_accuracy
hrnet_w18 fail_accuracy
jx_nest_base fail_accuracy
lcnet_050 fail_accuracy
mixer_b16_224 fail_accuracy
poolformer_m36 fail_accuracy
sebotnet33ts_256 fail_accuracy
swin_base_patch4_window7_224 fail_accuracy
visformer_small fail_accuracy
=======================Failed models in amp_fp16=============================
inference:
name accuracy
gernet_l fail_accuracy
ghostnet_100 fail_accuracy
res2next50 fail_accuracy
resnest101e fail_accuracy
training:
name accuracy
beit_base_patch16_224 fail_accuracy
convit_base fail_accuracy
convmixer_768_32 fail_accuracy
cspdarknet53 fail_accuracy
deit_base_distilled_patch16_224 fail_accuracy
eca_botnext26ts_256 fail_accuracy
hrnet_w18 fail_accuracy
jx_nest_base fail_accuracy
mobilevit_s fail_accuracy
swin_base_patch4_window7_224 fail_accuracy
tf_efficientnet_b0 fail_accuracy
vit_base_patch16_224 fail_accuracy
=======================Failed models in float32=============================
training:
name accuracy
fbnetv3_b fail_accuracy
swin_base_patch4_window7_224 fail_accuracy
@whitneywhtsang did your latest
float32
run for TIFF show these problems ?
There are two failures fixed by the latest float32
run.
With env:
export TIMM_FUSED_ATTN=0
Stonepia/pytorch dev/triton-test-3.0 0f6d72ce16bd4b30402dcad97144d17cd7bc53ed
intel/intel-xpu-backend-for-triton c5e75f5563e52fc1b1e810b33d152d3b0f448f33
intel-extension-for-pytorch 2.1.10+git99b4297
torch 2.1.0a0+git0f6d72c
torchvision 0.16.0a0+47cd5ea
timm 0.8.22.dev0
fail_to_run: cait_m36_384 (https://github.com/intel/intel-xpu-backend-for-triton/issues/523), eca_halonext26ts (https://github.com/intel/intel-xpu-backend-for-triton/issues/524), pnasnet5large fail_accuracy: crossvit_9_240 (https://github.com/intel/intel-xpu-backend-for-triton/issues/527), levit_128, xcit_large_24_p8_224 (https://github.com/intel/intel-xpu-backend-for-triton/issues/525)
fail_to_run: cait_m36_384 (https://github.com/intel/intel-xpu-backend-for-triton/issues/523), eca_halonext26ts (https://github.com/intel/intel-xpu-backend-for-triton/issues/524) fail_accuracy: crossvit_9_240 (https://github.com/intel/intel-xpu-backend-for-triton/issues/527), levit_128, xcit_large_24_p8_224 (https://github.com/intel/intel-xpu-backend-for-triton/issues/525), resnest101e
fail_to_run: cait_m36_384 (https://github.com/intel/intel-xpu-backend-for-triton/issues/523), eca_halonext26ts (https://github.com/intel/intel-xpu-backend-for-triton/issues/524) fail_accuracy: crossvit_9_240 (https://github.com/intel/intel-xpu-backend-for-triton/issues/527), levit_128, xcit_large_24_p8_224 (https://github.com/intel/intel-xpu-backend-for-triton/issues/525) failed to load: mobilenetv2_100, rexnet_100 (https://github.com/intel/intel-xpu-backend-for-triton/issues/521)
fail_to_run: cait_m36_384 (https://github.com/intel/intel-xpu-backend-for-triton/issues/523) fail_accuracy: crossvit_9_240 (https://github.com/intel/intel-xpu-backend-for-triton/issues/527), coat_lite_mini
fail_to_run: cait_m36_384 (https://github.com/intel/intel-xpu-backend-for-triton/issues/523) fail_accuracy: crossvit_9_240 (https://github.com/intel/intel-xpu-backend-for-triton/issues/527), SelecSls42b, adv_inception_v3
fail_to_run: cait_m36_384 (https://github.com/intel/intel-xpu-backend-for-triton/issues/523) fail_accuracy: crossvit_9_240 (https://github.com/intel/intel-xpu-backend-for-triton/issues/527), fbnetc_100, gluon_inception_v3, levit_128
Closing this ticket. All individual issues have been filed as a separate tickets to IPEX. IPEX Team is tracking overall progress.
Accuracy check results of
timm
models based on triton 3.0.0 (6 test scenarios in total)Test mode:
inference
andtraining
Test datatype:amp_bf16
amp_fp16
float32
This issue can be split into multiple work items
Failed model list
Reproduce: (replace with real dtype and model)
Version:
triton: https://github.com/intel/intel-xpu-backend-for-triton/commit/97ac4f91d149a3392d6e14f5d39aa4953fb6c56e