salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
https://arxiv.org/abs/2305.07922
BSD 3-Clause "New" or "Revised" License
2.65k stars 391 forks source link

'T5Stack' object has no attribute 'first_device #155

Open ChangXiaoning opened 8 months ago

ChangXiaoning commented 8 months ago

I have already begun the fine tuning. However, some thing wrong:

***** Running training *****
  Num examples = 2830
  Num Epochs = 6
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 64
  Gradient Accumulation steps = 8
  Total optimization steps = 264
  0%|                                                                                                                                                                                                                                                                                                               | 0/264 [00:00<?, ?it/s]Traceback (most recent call last):
  File "workspace/train_code5.py", line 268, in <module>
    train()
  File "workspace/train_code5.py", line 262, in train
    trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/root/miniconda3/envs/seedpicker/lib/python3.9/site-packages/transformers/trainer.py", line 1498, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/seedpicker/lib/python3.9/site-packages/transformers/trainer.py", line 1740, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/miniconda3/envs/seedpicker/lib/python3.9/site-packages/transformers/trainer.py", line 2470, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/miniconda3/envs/seedpicker/lib/python3.9/site-packages/transformers/trainer.py", line 2502, in compute_loss
    outputs = model(**inputs)
  File "/root/miniconda3/envs/seedpicker/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/seedpicker/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/root/miniconda3/envs/seedpicker/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1768, in forward
    loss = self.module(*inputs, **kwargs)
  File "/root/miniconda3/envs/seedpicker/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/seedpicker/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 1621, in forward
    torch.cuda.set_device(self.decoder.first_device)
  File "/root/miniconda3/envs/seedpicker/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(

I check the issue repo and install the transformers with version 4.21.3 as suggestion https://github.com/salesforce/CodeT5/issues/113.

What should I do? Thanks.

xxxVincent-L commented 7 months ago

Face the same problem here : (