intel / torch-xpu-ops

Apache License 2.0
30 stars 21 forks source link

Reduction: Enhance reduction kernel with supporting data type dynamic cast #685

Open fengyuan14 opened 3 months ago

fengyuan14 commented 3 months ago

🚀 The feature, motivation and pitch

It is a performance requirement. The existing CUDA implementation in PyTorch supports data type dynamic cast, so that there won't be an extra kernel to align data types of input and output.

Alternatives

No response

Additional context

No response

fengyuan14 commented 1 month ago

Not an urgent case, as the usage is rare. Lower the priority.