Hello, could you please help me understand why I found that when I followed the approach in this issue, the multi-GPU training runs correctly on a V100 machine, but when I run the same code on a machine with four 3090 GPUs, I encounter the error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3! #31
Oops. Sorry for that. (according to experience of other researchers that have used our code) You may also need to set the `device_map='auto'` for the https://github.com/rui-ye/OpenFedLLM/blob/427aec52f068860a835244563dd4f9b48bf06f00/main_sft.py#L34
Originally posted by @rui-ye in https://github.com/rui-ye/OpenFedLLM/issues/21#issuecomment-2176527114