intel / torch-xpu-ops

Apache License 2.0
19 stars 14 forks source link

Pytorch_CycleGAN_and_pix2pix RuntimeError: "reflection_pad2d" not implemented for 'Half' #491

Closed mengfei25 closed 1 month ago

mengfei25 commented 2 months ago

🐛 Describe the bug

torchbench_amp_fp16_training xpu train pytorch_CycleGAN_and_pix2pix Traceback (most recent call last): File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2294, in validate_model self.model_iter_fn(model, example_inputs) File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 456, in forward_and_backward_pass pred = mod(cloned_inputs) File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1566, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1575, in _call_impl return forward_call(*args, *kwargs) File "/home/sdp/actions-runner/_work/torch-xpu-ops/benchmark/torchbenchmark/models/pytorch_CycleGAN_and_pix2pix/models/networks.py", line 377, in forward return self.model(input) File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1566, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1575, in _call_impl return forward_call(*args, kwargs) File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/container.py", line 219, in forward input = module(input) File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1566, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1575, in _call_impl return forward_call(args, kwargs) File "/home/sdp/actions-runner/_work/torch-xpu-ops/benchmark/torchbenchmark/models/pytorch_CycleGAN_and_pix2pix/models/networks.py", line 436, in forward out = x + self.conv_block(x) # add skip connections File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1566, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1575, in _call_impl return forward_call(*args, *kwargs) File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/container.py", line 219, in forward input = module(input) File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1566, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1575, in _call_impl return forward_call(*args, **kwargs) File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/modules/padding.py", line 359, in forward return F.pad(input, self.padding, 'reflect') File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/functional.py", line 4552, in pad return torch._C._nn.pad(input, pad, mode, value) RuntimeError: "reflection_pad2d" not implemented for 'Half'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 4177, in run ) = runner.load_model( File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 380, in load_model self.validate_model(model, example_inputs) File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2296, in validate_model raise RuntimeError("Eager run failed") from e RuntimeError: Eager run failed

eager_fail_to_run

Versions

torch-xpu-ops: https://github.com/intel/torch-xpu-ops/commit/31c400195d63064940242220dc9100322d36bac4 pytorch: 0f81473d7b4a1bf09246410712df22541be7caf3 + PRs: 127277,129120 device: PVC 1100, 803.61, 0.5.1

weishi-deng commented 1 month ago

Root caused by the cpu backend does not support half for "reflection_pad". Reproducer:

/

import torch
m = torch.nn.ReflectionPad2d(2)
input = torch.arange(9, dtype=torch.float16).reshape(1, 1, 3, 3).cpu()
m(input)
# using different paddings for different sides
m = nn.ReflectionPad2d((1, 1, 2, 0))
m(input)
chuanqi129 commented 1 month ago

@weishi-deng So we need to add xpu kennel for this op, right?

weishi-deng commented 1 month ago

@chuanqi129 Actually this issue is on cpu implementation. Our implementation in IPEX has enabled the fp16 support.

chuanqi129 commented 1 month ago

@chuanqi129 Actually this issue is on cpu implementation. Our implementation in IPEX has enabled the fp16 support.

Hi @weishi-deng, this issue is for stock pytorch, we need to make sure the torch-xpu-ops has such implementation also.

weishi-deng commented 1 month ago

@chuanqi129 This point is that this op is on the fallback list now.