intel / torch-xpu-ops

Apache License 2.0
30 stars 21 forks source link

[E2E] Torchbench CPU only models #711

Open mengfei25 opened 3 months ago

mengfei25 commented 3 months ago

🐛 Describe the bug

Traceback (most recent call last): File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 4626, in run ) = runner.load_model( File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 309, in load_model benchmark = benchmark_cls( File "/home/sdp/actions-runner/_work/torch-xpu-ops/benchmark/torchbenchmark/util/model.py", line 39, in call obj = type.call(cls, *args, **kwargs) File "/home/sdp/actions-runner/_work/torch-xpu-ops/benchmark/torchbenchmark/models/resnet50_quantized_qat/init.py", line 21, in init raise NotImplementedError("The eval test only supports CPU.") NotImplementedError: The eval test only supports CPU.

model_fail_to_load

Versions

torch-xpu-ops: https://github.com/intel/torch-xpu-ops/commit/1d70431c072db889d9a47ea4956049fe340a426d pytorch: d224857b3af5c9d5a3c7a48401475c09d90db296 device: pvc 1100, bundle: 0.5.3, driver: 803.61

retonym commented 4 days ago

this model still fails with new error message.

loading model: 0it [00:02, ?it/s]
xpu  train resnet50_quantized_qat             
Traceback (most recent call last):
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2672, in validate_model
    self.model_iter_fn(model, example_inputs)
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 457, in forward_and_backward_pass
    pred = mod(*cloned_inputs)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/fx/graph_module.py", line 822, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/fx/graph_module.py", line 400, in __call__
    raise e
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/fx/graph_module.py", line 387, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _call_impl
    return forward_call(*args, **kwargs)
  File "<eval_with_key>.3", line 167, in forward
    activation_post_process_73 = self.activation_post_process_73(fc);  fc = None
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/ao/quantization/fake_quantize.py", line 408, in forward
    return torch.fused_moving_avg_obs_fake_quant(
RuntimeError: expected scalar type Float but found Half

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 4813, in run
    ) = runner.load_model(
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 369, in load_model
    self.validate_model(model, example_inputs)
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2674, in validate_model
    raise RuntimeError("Eager run failed") from e
RuntimeError: Eager run failed

eager_fail_to_run