RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0!

Haruka1307 commented 2 months ago

Hi!

I try to do step 2 on device cuda"6" since cuda "0" is in use,so I move the batches and model to cuda"6". I print device of batch and model in obtain_gradients_with_adam function to confirm.

But err occurs as below:

Traceback (most recent call last): File "/home/u2019000171/.conda/envs/less/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/u2019000171/.conda/envs/less/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/u2019000171/cjy/LESS/less/data_selection/get_info.py", line 156, in collect_grads(dataloader, File "/home/u2019000171/cjy/LESS/less/data_selection/collect_grad_reps.py", line 263, in collect_grads vectorized_grads = obtain_gradients_with_adam(model, batch, m, v) File "/home/u2019000171/cjy/LESS/less/data_selection/collect_grad_reps.py", line 121, in obtain_gradients_with_adam loss = model(batch,).loss File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/peft/peft_model.py", line 1081, in forward return self.base_model( File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 103, in forward return self.model.forward(args, kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward outputs = self.model( File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1026, in forward inputs_embeds = self.embed_tokens(input_ids) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(args, kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward return F.embedding( File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/functional.py", line 2233, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) Using Adam gradients cuda:6 cuda:6

I didn't figure out that...

xiamengzhou commented 2 months ago

Could you try export CUDA_VISIBLE_DEVICES=6 instead?

DDrShieh commented 2 months ago

Could you try export CUDA_VISIBLE_DEVICES=6 instead?

It works for 80G momory devices but not for less memory devices. Could you please share any advice for other devices like 32G instead of CPU-based method?

Haruka1307 commented 1 month ago

Could you try export CUDA_VISIBLE_DEVICES=6 instead?

It works for 80G momory devices but not for less memory devices. Could you please share any advice for other devices like 32G instead of CPU-based method?

You may try fix device_map to device_map = {'': 'cuda:0'} ? .I think auto device map may lead to put model to cuda 1

princeton-nlp / LESS

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0! #19