Hello, I tried to fine-tune codet5p-2b. I loaded the model from huggingface and I got an error saying CUDA out of memory, then I tried to load the model into multiple GPUs by adding device_map = 'auto' when load the model. But I got another error:
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
==> Loaded model from Salesforce/codet5p-2b, model size 3112427008
Starting main loop
/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
0%| | 0/1760 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/transformers/trainer.py", line 2767, in compute_loss
outputs = model(**inputs)
File "/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/zhuxunyu/.cache/huggingface/modules/transformers_modules/codet5p-2b/modeling_codet5p.py", line 936, in forward
loss = loss_fct(logits.reshape(-1, self.decoder.config.vocab_size), labels.view(-1))
File "/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1174, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/torch/nn/functional.py", line 3029, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument target in method wrapper_CUDA_nll_loss_forward)
python-BaseException
Hello, I tried to fine-tune codet5p-2b. I loaded the model from huggingface and I got an error saying CUDA out of memory, then I tried to load the model into multiple GPUs by adding device_map = 'auto' when load the model. But I got another error: