Fix https://github.com/pytorch/pytorch/issues/140877
Some PyTorch C++ users could call empty op directly. In this situation, lazy initialization would not have been triggered. So we have to add lazyInitDevice here, which also aligns with CUDA convention.
Motivation
Fix https://github.com/pytorch/pytorch/issues/140877 Some PyTorch C++ users could call empty op directly. In this situation, lazy initialization would not have been triggered. So we have to add
lazyInitDevice
here, which also aligns with CUDA convention.