Performance: Linear: Worse host overhead due to an extra copy introduced

intel / torch-xpu-ops

Apache License 2.0

30 stars 21 forks source link

Closed fengyuan14 closed 1 month ago

fengyuan14 commented 1 month ago

2.5 aten::linear also introduced an additional aten::copy_, that make aten::linear latency dropped from 308us to 426us.

Latest torch-xpu-ops

fengyuan14 commented 1 month ago

See https://github.com/intel/torch-xpu-ops/issues/977. Autocast difference between IPEX and torch-xpu-ops leads to the additional copy. According to the current requirement, it is not a defect.