Open ojipadeson opened 2 months ago
@ojipadeson I can help you here since I've faced pretty much the same problem recently. Turns out this behavior is normal when using DeepSpeed ZeRO-3, because it does parameter offloading on different devices (GPU or CPU). As described in DeepSpeed's documentation, if you try to access some model parameters (in this case the embedding layer) outside its forward()
method, there's a chance they'll appear as empty (as in your case) because they have been offloaded (=moved) to another device.
When you try to access the same parameters inside the model's forward()
method, DeepSpeed automatically fetches them from whatever device they were offloaded to. On the other hand, if you want to access them outside the forward()
method, then you have to manually gather them (that's the technical term) using deepspeed.zero.GatheredParameters
. Try the following:
import deepspeed
import torch
import transformers
from transformers import AutoModelForCausalLM
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class TrainingArguments(transformers.TrainingArguments):
cache_dir: Optional[str] = field(default=None)
parser = transformers.HfArgumentParser(TrainingArguments)
training_args = parser.parse_args_into_dataclasses()
# Load model and tokenizer
config = transformers.AutoConfig.from_pretrained(
"path/to/Qwen2",
)
llm_model = AutoModelForCausalLM.from_pretrained(
"path/to/Qwen2",
config=config,
)
pretrained_embed = llm_model.get_input_embeddings()
with deepspeed.zero.GatheredParameters(pretrained_embed.weight, modifier_rank=0):
print(pretrained_embed) # Embedding(152064, 3584)
print(pretrained_embed.weight.shape) # torch.Size([0])
out = pretrained_embed(torch.ones((1, 1024), dtype=torch.int))
print(out.shape)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.42.4Who can help?
@muellerzr @muellerzr @SunMarc
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
My python script
train_temp.py
:My running script:
My
ds_config_zero3.json
:Error:
When delete
TrainingArguments
part, the embedding size return normal.Expected behavior