As to staging goal of PyTorch 2.5, we collect 484 operators which are required working with XPU backend. Part of them are required XPU specific implementation.
When we give XPU implementation for an ATen operator, we need register all variants of the operator, like xxx.out, xxx.Tensor, xxx.Scalar, xxx_ and so on.
Following the rule,
We won't take additional efforts to be back for lack of registration in future and complement them. Adding variants at the moment is cheap.
When we align with CUDA registration, in-tree would be seamless.
🚀 The feature, motivation and pitch
As to staging goal of PyTorch 2.5, we collect 484 operators which are required working with XPU backend. Part of them are required XPU specific implementation. When we give XPU implementation for an ATen operator, we need register all variants of the operator, like xxx.out, xxx.Tensor, xxx.Scalar, xxx_ and so on. Following the rule,