Failed to train CodeT5p-2b on multi-gpus card

Hello, I tried to fine-tune codet5p-2b. I loaded the model from huggingface and I got an error saying CUDA out of memory, then I tried to load the model into multiple GPUs by adding device_map = 'auto' when load the model. But I got another error:

The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
  ==> Loaded model from Salesforce/codet5p-2b, model size 3112427008
Starting main loop
/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
  0%|                                                  | 0/1760 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/transformers/trainer.py", line 2735, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/transformers/trainer.py", line 2767, in compute_loss
    outputs = model(**inputs)
  File "/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/zhuxunyu/.cache/huggingface/modules/transformers_modules/codet5p-2b/modeling_codet5p.py", line 936, in forward
    loss = loss_fct(logits.reshape(-1, self.decoder.config.vocab_size), labels.view(-1))
  File "/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1174, in forward
    return F.cross_entropy(input, target, weight=self.weight,
  File "/home/zhuxunyu/miniconda3/envs/openai/lib/python3.8/site-packages/torch/nn/functional.py", line 3029, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument target in method wrapper_CUDA_nll_loss_forward)
python-BaseException

salesforce / CodeT5

Failed to train CodeT5p-2b on multi-gpus card #156