intel / torch-xpu-ops

Apache License 2.0
30 stars 21 forks source link

Performance: Improve BatchNormalization forward/backward to align with oneDNN implementation. #937

Closed fengyuan14 closed 4 days ago

fengyuan14 commented 1 month ago

🚀 The feature, motivation and pitch

Basing on the consideration of accuracy, we followed the PyTorch CUDA implementation, using Welford algorithm and similar kernel template. Will improve the kernel template with vectorized load/store.

Alternatives

No response

Additional context

No response

fengyuan14 commented 1 month ago

https://github.com/intel/torch-xpu-ops/pull/933

xytintel commented 4 days ago

Merged