Tracker for `aot_eager` to reach 100% passrate on OSS models

pytorch / torchdynamo

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

BSD 3-Clause "New" or "Revised" License

1.01k stars 123 forks source link

Tracker for `aot_eager` to reach 100% passrate on OSS models #1091

Closed anijain2305 closed 2 years ago

anijain2305 commented 2 years ago

TorchDynamo Dashboard shows that Aot eager is not 100% for all the models. This is a tracker for the missing work.

float32

[x] hf_BigBird - https://github.com/pytorch/torchdynamo/issues/1291
[ ] speech_transfomer - https://github.com/pytorch/torchdynamo/issues/1287
[ ] vision_maskrcnn - Error signature same as pytorch/torchdynamo#1287. Somehow minifier is not able to repro this.
[x] BigBird - https://github.com/pytorch/torchdynamo/issues/1291
[x] DebertaForQuestionAnswering - The error might be well within margin. But waiting for https://github.com/pytorch/torchdynamo/issues/1273
[x] convit_base
[x] jx_next_base - https://github.com/pytorch/torchdynamo/issues/1286

float16

No new errors

AMP

[x] timm_nfnet - Accuracy issue
[x] timm_efficientdet - Accuracy issue
[x] mobilevit_s - Accuracy issue
[x] eca_botnext26ts_256 - Accuracy issue

IvanYashchuk commented 2 years ago

Here are a couple of failing models that fail with "RuntimeError: expected scalar type Float but found Half":

[ ] python benchmarks/torchbench.py --training -d cuda --fast --accuracy-aot-nop --skip-accuracy-check --generate-aot-autograd-stats -k mobilenet_v2_quantized_qat --float16
[ ] python benchmarks/torchbench.py --training -d cuda --fast --accuracy-aot-nop --skip-accuracy-check --generate-aot-autograd-stats -k resnet50_quantized_qat --float16
[ ] python benchmarks/torchbench.py --training -d cuda --fast --accuracy-aot-nop --skip-accuracy-check --generate-aot-autograd-stats -k mobilenet_v2_quantized_qat --amp
[ ] python benchmarks/torchbench.py --training -d cuda --fast --accuracy-aot-nop --skip-accuracy-check --generate-aot-autograd-stats -k resnet50_quantized_qat --amp

anijain2305 commented 2 years ago

@IvanYashchuk All these models fail in the native Pytorch itself. Basically, they can't survive the float16/amp conversion. We never get to the stage where we could run TorchDynamo on them. So, we are skipping these tests in TorchDynamo nightly. This also kinda makes sense as these are quantized models, and some things might be hardcoded.

Since the issue is in PyTorch half/amp conversion, this might not be the best place to track these. So, I suggest to skip them.

anijain2305 commented 2 years ago

Closing in favor of pytorch/pytorch#93777