Scalar issue: Data Parallel with 2 core GPU

eswarthammana commented 1 year ago

Dear Team,

I tried to train the model with 2 core GPU as 0,1 I faced the following problem, which i have not faced with 1 core GPU. Could you please help me to solve the issue.

Environment: Kaggle Accelerator: GPU T4 x 2

/opt/conda/lib/python3.7/site-packages/transformers/optimization.py:395: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning FutureWarning, Training: 0%| | 0/3125 [00:00<?, ?it/s]/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. warnings.warn('Was asked to gather along dimension 0, but all ' [0] Train loss 0.258: 100%|██████████| 3125/3125 [29:17<00:00, 1.78it/s] 100%|██████████| 2000/2000 [00:07<00:00, 273.69it/s] Eval ppl: 0%| | 0/63 [00:00<?, ?it/s] Traceback (most recent call last): File "/kaggle/working/CodeT5/run_gen.py", line 387, in main() File "/kaggle/working/CodeT5/run_gen.py", line 265, in main eval_ppl = eval_ppl_epoch(args, eval_data, eval_examples, model, tokenizer) File "/kaggle/working/CodeT5/run_gen.py", line 75, in eval_ppl_epoch eval_loss += loss.item() ValueError: only one element tensors can be converted to Python scalars

alibrahimzada commented 1 year ago

I faced a similar issue. I added a condition like below in run_gen.py (line 75):

outputs = model(input_ids=source_ids, attention_mask=source_mask,
                labels=target_ids, decoder_attention_mask=target_mask)
loss = outputs.loss
if args.n_gpu > 1:
    loss = loss.mean()

It now works for me.

Sleepyhead01 commented 1 year ago

Hi, I'm unable to finetune with multiple GPUs. Can @eswarthammana or @alibrahimzada tell me about any modifications required to the scripts for this?

Tx

alibrahimzada commented 1 year ago

make sure you execute your script with torchrun rather than python3/python. I don't think there are other requirements for multi-GPU execution.

eswarthammana commented 1 year ago

Hi @Sleepyhead01,

the one i tried is with in exp_with_args.sh at the end of the file CUDA_VISIBLE_DEVICES=${GPU} modify the ${GPU} value as 0, 1 through code it accepts only integer we cannot pass more than one value.

As @alibrahimzada mentioned modify the loss as loss.mean()

Sleepyhead01 commented 1 year ago

Training with multiple GPUs starts with this modification. However, eval_bleu_epoch gives the following error:

Traceback (most recent call last):
  File "CodeT5/run_gen.py", line 392, in <module>
    main()
  File "CodeT5/run_gen.py", line 319, in main
    result = eval_bleu_epoch(args, eval_data, eval_examples, model, tokenizer, 'dev', 'e%d' % cur_epoch)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "CodeT5/run_gen.py", line 109, in eval_bleu_epoch
    preds = model.generate(source_ids,
            ^^^^^^^^^^^^^^
  File "anaconda3/envs/Old_R/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DataParallel' object has no attribute 'generate'

Any fix for this? Tx

alibrahimzada commented 1 year ago

@Sleepyhead01 you need to do model.module.generate() because for n_gpu > 1... model is an attribute of DataParallel. To get the model, you should call .module on it.

Unfortunately the authors have not maintained these scripts with newer versions of torch.

salesforce / CodeT5

Scalar issue: Data Parallel with 2 core GPU #91