Closed Gaiejj closed 1 month ago
Hey! I think #32192 should have fixed it!
It seems the issue is still not fixed. You can check the progress in #32192.
Thank you very much for your prompt response and continuous follow-up. I will closely monitor the latest updates. Thanks again for your hard work! ❤️
This issue is resolved by #32214! Thanks to @zucchini-nlp.
On my way to do a patch then! Thanks all for reporting this quickly, and thanks @zucchini-nlp for your quick fixes!
Congratulations❤️ ! We have successfully executed full-parameter PPO fine-tuning on Llama 3.1. Thanks again to @ArthurZucker @iamseokhyun and @zucchini-nlp for their super quick effort and follow-up!!!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Closing as completed!
System Info
transformers
version: 4.43.1Who can help?
@ArthurZucker When using deepspeed ZeRO3 to train the llama2-7b-hf model, I encountered an error during the resize_embedding process that I couldn't resolve. The llama2-7b-hf tokenizer lacks a pad_token, so I specified a default value for it, which requires resizing the embedding. However, this command executes correctly in transformers version 4.41.2 but fails in version 4.43.0.
I identified the following two anomalies:
I've spent a lot of time pinpointing this issue, but I genuinely don't know how to resolve it. I sincerely hope you can provide assistance. This would be incredibly helpful, and I express my heartfelt gratitude to you.
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
from transformers import ( AutoModelForCausalLM, AutoTokenizer )
from transformers.integrations.deepspeed import HfDeepSpeedConfig
DEFAULT_BOS_TOKEN: str = ''
DEFAULT_UNK_TOKEN: str = ''
' DEFAULT_EOS_TOKEN: str = '' DEFAULT_PAD_TOKEN: str = 'model_name_or_path = 'PATHTO/Llama-2-7b-hf' ds_cfgs_path = 'PATH'
deepspeed.init_distributed()
with open(ds_cfgs_path) as f: ds_cfgs = json.load(f) ds_cfgs['bf16']['enabled'] = True
dstchf = HfDeepSpeedConfig(ds_cfgs)
tokenizer = AutoTokenizer.from_pretrained( model_name_or_path, model_max_length=2048, padding_side='right', trust_remote_code=True, ) model = AutoModelForCausalLM.from_pretrained( model_name_or_path, torch_dtype=torch.bfloat16, trust_remote_code=True, )
Reference: https://github.com/tatsu-lab/stanford_alpaca/blob/main/train.py
def resize_tokenizer_embedding(tokenizer, model) -> None: """Resize tokenizer and embedding.
resize_tokenizer_embedding(tokenizer=tokenizer, model=model)