Complement operator variants when implementing a required operator

🚀 The feature, motivation and pitch

As to staging goal of PyTorch 2.5, we collect 484 operators which are required working with XPU backend. Part of them are required XPU specific implementation. When we give XPU implementation for an ATen operator, we need register all variants of the operator, like xxx.out, xxx.Tensor, xxx.Scalar, xxx_ and so on. Following the rule,

We won't take additional efforts to be back for lack of registration in future and complement them. Adding variants at the moment is cheap.
When we align with CUDA registration, in-tree would be seamless.

[x] random_.to
[x] clamp.Tensor_out
[x] clamp_min.Tensor_out
[x] clamp_max.Tensor_out
[x] fmod.Scalar // remove
[x] fmod_.Scalar // remove
[x] indexadd
[x] index_add
[x] remainder.Scalar_out // remove
[x] remainder.Scalar // remove
[x] remainder_.Scalar // remove
[x] rsub.Scalar // remove
[x] rsub.Scalar_out // remove
[x] rsub.Tensor_out // remove
[x] sub.Scalar // remove
[x] sub_.Scalar // remove
[x] sub.Scalar_out // remove
[x] sum.out // remove
[x] sum.dim_IntList // remove

intel / torch-xpu-ops

Complement operator variants when implementing a required operator #197

🚀 The feature, motivation and pitch