intel / torch-xpu-ops

Apache License 2.0
30 stars 21 forks source link

Performance: Linear: Worse host overhead due to an extra copy introduced #978

Closed fengyuan14 closed 1 month ago

fengyuan14 commented 1 month ago

🐛 Describe the bug

2.5 aten::linear also introduced an additional aten::copy_, that make aten::linear latency dropped from 308us to 426us.

Versions

Latest torch-xpu-ops

fengyuan14 commented 1 month ago

See https://github.com/intel/torch-xpu-ops/issues/977. Autocast difference between IPEX and torch-xpu-ops leads to the additional copy. According to the current requirement, it is not a defect.