foreach_add behavior is not aligned with cuda

🐛 Describe the bug

test_foreach.py::TestForeachCUDA::test_0dim_tensor_overload_exception_cuda is expected to report "RuntimeError: scalar tensor expected to be on cuda:0 but is on cpu" while xpu does not have such error message.

tensors = [
            make_tensor((2, 2), dtype=torch.float, device="cuda") for _ in range(2)
        ]
        with self.assertRaisesRegex(RuntimeError, "scalar tensor expected to be on"):
            torch._foreach_add(tensors, torch.tensor(1.0, device="cpu"), alpha=1.0)  # ****

Please note cuda only report the error when alpha is specified.

Versions

latest version

intel / torch-xpu-ops

foreach_add behavior is not aligned with cuda #784

🐛 Describe the bug

Versions