[inductor] TIMM training failures tracker

desertfire commented 2 years ago

[Editied]

benchmarks/timm_models.py -d cuda --inductor --training --float32 --use-eval-mode

Snapshot of Aug 22:

[x] pytorch/torchdynamo#862
- jx_nest_base
[x] pytorch/torchdynamo#965
- nasnetalarge
- pnasnet5large
[ ] OOM Error, pytorch/torchdynamo#822 :
- adv_inception_v3
- cait_m36_384
- beit_base_patch16_224
- convit_base
- convmixer_768_32
- convnext_base
- deit_base_distilled_patch16_224
- densenet121
- dpn107
- ecaresnet101d
- gluon_senet154
- gmixer_24_224
- gmlp_s16_224
- legacy_senet154
- mixer_b16_224
- mixnet_l
- mobilevit_s
- res2net101_26w_4s
- resnest101e
- sebotnet33ts_256
- swin_base_patch4_window7_224
- xcit_large_24_p8_224
- volo_d1_224
- vit_base_patch16_224
- tf_mixnet_l
- tnt_s_patch16_224
- twins_pcpvt_base
[ ] Accuracy Error :
- coat_lite_mini
- dla102
- ese_vovnet19b_dw
- fbnetc_100
- ghostnet_100
- gluon_inception_v3
- gluon_xception65
- hrnet_w18
- inception_v3
- mnasnet_100
- mobilenetv2_100
- rexnet_100
- selecsls42b
- spnasnet_100

The following ones also fail with aot_nvfuser:

[ ] levit_128
[ ] convnext_base
[ ] legacy_senet154

eellison commented 2 years ago

I got errors for both convnext_base and xcit_large_24_p8_224 when running with --accuracy-aot-nop.

legacy_senet154 also fails in eager - Variation in Eager runs itself

desertfire commented 2 years ago

I got errors for both convnext_base and xcit_large_24_p8_224 when running with --accuracy-aot-nop.

legacy_senet154 also fails in eager - Variation in Eager runs itself

Thanks for checking. Do you mind to file separate issues to track those for closer attention to folks working on AOTAutograd?

desertfire commented 2 years ago

Combine with https://github.com/pytorch/pytorch/issues/93777

pytorch / torchdynamo

[inductor] TIMM training failures tracker #780