Closed sajastu closed 3 years ago
Thank you for the report, @sajastu
Could you please adjust the command line in your report so that it uses some small public dataset and not custom files which we don't have?
Then I will sort it out.
Thank you.
Sure thing! @stas00
Please let me modify the script, and then test so that it runs flawlessly. I'll give you an update shortly!
I was able to reproduce the problem with:
export BS=16; PYTHONPATH=src USE_TF=0 CUDA_VISIBLE_DEVICES=0,1 deepspeed --num_gpus=2 \
examples/pytorch/summarization/run_summarization.py --model_name_or_path \
google/pegasus-cnn_dailymail --output_dir output_dir --adam_eps 1e-06 --do_train --label_smoothing \
0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 500 --max_source_length 128 \
--max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_train_batch_size \
$BS --predict_with_generate --sortish_sampler --dataset_name cnn_dailymail --dataset_config "3.0.0" \
--val_max_target_length 128 --warmup_steps 50 --max_train_samples 50 --max_eval_samples 50 \
--deepspeed tests/deepspeed/ds_config_zero3.json
So nothing else needs to be done by your side.
so the quick fix is:
--- a/src/transformers/models/pegasus/modeling_pegasus.py
+++ b/src/transformers/models/pegasus/modeling_pegasus.py
@@ -26,6 +26,7 @@ from torch import nn
from torch.nn import CrossEntropyLoss
from ...activations import ACT2FN
+from ...deepspeed import is_deepspeed_zero3_enabled
from ...file_utils import (
add_end_docstrings,
add_start_docstrings,
@@ -109,7 +110,13 @@ class PegasusSinusoidalPositionalEmbedding(nn.Embedding):
def __init__(self, num_positions: int, embedding_dim: int, padding_idx: Optional[int] = None):
super().__init__(num_positions, embedding_dim)
- self.weight = self._init_weight(self.weight)
+ if is_deepspeed_zero3_enabled():
+ import deepspeed
+ with deepspeed.zero.GatheredParameters(self.weight, modifier_rank=0):
+ self.weight = self._init_weight(self.weight)
+ else:
+ self.weight = self._init_weight(self.weight)
+
@staticmethod
def _init_weight(out: nn.Parameter):
Let me know if you can handle the diff.
I will work on a normal PR and test. Ideally should think of something that requires less code changes, but it will do the right thing for now.
@stas00 Thanks. It works perfectly now!
thank you for validating that it works for you.
I'm trying to have this solved on the deepspeed side, so that all our models will work w/o needing to change each one of them separately. so I will keep you posted on the progress.
If you want to try the fix on the deepspeed side, instead of the workaround on transformers side, you can try this branch: https://github.com/microsoft/DeepSpeed/pull/1202
https://github.com/microsoft/DeepSpeed/pull/1202 has been merged, so if you use the master version of deepspeed, you no longer need the workaround I shared with you.
I will close this, but if you still encounter any problems please feel free to re-open.
Environment info
transformers
version: 4.9.0.dev0Using distributed or parallel set-up in script?: Y - Deepspeed version: deepspeed 0.4.1 (installed with pip)
@stas00,
Information
I'm trying to fine-tuned pegasus-large model using deepspeed with multi-gpu. It seems that deepspeed is unable to initialize the weights in the beginning. While, I removed deepspeed and weights seem to be properly initialized. I'm hesitating if this is a bug with deepspeed library. Details are given below.
The command:
Error message:
ds_config.json
is Zero3 copied from the repository.self.out
: withdeepspeed
its shape is[1]
and only contains a 1-d tensor with value 1. However, in single-gpu env, the shape is[1024, 1024]
which contains floating numbers (i.e., much like embeddings).The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior: