Closed fengyuan14 closed 1 month ago
2.5 aten::linear also introduced an additional aten::copy_, that make aten::linear latency dropped from 308us to 426us.
Latest torch-xpu-ops
See https://github.com/intel/torch-xpu-ops/issues/977. Autocast difference between IPEX and torch-xpu-ops leads to the additional copy. According to the current requirement, it is not a defect.
🐛 Describe the bug
2.5 aten::linear also introduced an additional aten::copy_, that make aten::linear latency dropped from 308us to 426us.
Versions
Latest torch-xpu-ops