lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.38k stars 4.48k forks source link

support finetune on Qwen14B #2476

Open lucasjinreal opened 11 months ago

lucasjinreal commented 11 months ago

Please add support finetune for Qwen, curretnly the model sft will got a 0 loss using default training script, hardly to know why.

kyriekevin commented 11 months ago

You can refer to the finetune officially provided by qwen. I also tried to train qwen-7b on the default code provided by Fastchat before, but the loss has always been 0. After using the finetune officially provided by qwen, the loss dropped normally.

lucasjinreal commented 11 months ago

@kyriekevin It is, however, I just want to know why doesn't fschat format didn't work.... Do u got any idea? I previous got same error when sft on XVERSE, but I fixed by adjust offset when prepareing target. but this one, I can not fix it.

kyriekevin commented 11 months ago
  1. Because qwen uses the chatml format for pre-training, the role settings are different. The role segmentation in the fastchat code is strict, and problems will arise when segmenting different roles. I have always reported mismatch problems before.
  2. It seems that whether it is fastchat's default code or qwen's finetune code, hardcode is set for some offsets. This may be the reason why the loss calculation is incorrect. I have changed the offset before, and the loss is not 0, but it will decrease abnormally. So I think it may be due to these two aspects
lucasjinreal commented 11 months ago

Am using base model for sft. Not finetune on chat model. I suppose base model shouldn't be effected by some certain formats? These offsets in fschat really weird.

kyriekevin commented 11 months ago

My understanding is that it will affect the mask and other information. Because I am fully fine-tuning, this part is still very obvious. My modification plan is mentioned in the qwen issues. qwen official has affirmed this modification plan. You can try it. The specific reason why the loss is 0 may require more in-depth research. Currently, I am busy improving the performance of the business vertical category, and I just hope that it can be fine-tuned normally.

https://github.com/QwenLM/Qwen/issues/310

lucasjinreal commented 11 months ago

Am using qwen's finetune script for sft now, it works normal. But hope fschat officially add support with qwen finetune, since this is the best and largest opensource cn base model so far.

ye7love7 commented 11 months ago

请教大佬是能否8bit量化加载启动Qwen14b的?fastchat代码中限定了使用bin格式,而qwen14b只提供了safetensor格式。

yhfgyyf commented 11 months ago

请教大佬是能否8bit量化加载启动Qwen14b的?fastchat代码中限定了使用bin格式,而qwen14b只提供了safetensor格式。

FastChat/fastchat/model/compression.py if "T5Config" in str(type(config)): model = AutoModelForSeq2SeqLM.from_config( config, trust_remote_code=True ) 136行后面,增加: elif "QWenConfig" in str(type(config)): from transformers.generation import GenerationConfig model = AutoModelForCausalLM.from_pretrained( model_path, trust_remote_code=True, resume_download=True, load_in_8bit=True ).eval() config = GenerationConfig.from_pretrained( model_path, trust_remote_code=True, resume_download=True,) tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, resume_download=True,) return model, tokenizer

Qwen-14B-Chat 权重目录下,修改 config.json,: "fp16": true,

ye7love7 commented 11 months ago

感谢大佬

---Original--- From: @.> Date: Tue, Oct 3, 2023 07:59 AM To: @.>; Cc: @.**@.>; Subject: Re: [lm-sys/FastChat] support finetune on Qwen14B (Issue #2476)

请教大佬是能否8bit量化加载启动Qwen14b的?fastchat代码中限定了使用bin格式,而qwen14b只提供了safetensor格式。 FastChat/fastchat/model/compression.py if "T5Config" in str(type(config)): model = AutoModelForSeq2SeqLM.from_config( config, trust_remote_code=True ) 136行后面,增加: elif "QWenConfig" in str(type(config)): from transformers.generation import GenerationConfig model = AutoModelForCausalLM.from_pretrained( model_path, trust_remote_code=True, resume_download=True, load_in_8bit=True ).eval() config = GenerationConfig.from_pretrained( model_path, trust_remote_code=True, resume_download=True,) tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, resume_download=True,) return model, tokenizer

Qwen-14B-Chat 权重目录下,修改 config.json,: "fp16": true,

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

blueruler commented 8 months ago
  1. Because qwen uses the chatml format for pre-training, the role settings are different. The role segmentation in the fastchat code is strict, and problems will arise when segmenting different roles. I have always reported mismatch problems before.
  2. It seems that whether it is fastchat's default code or qwen's finetune code, hardcode is set for some offsets. This may be the reason why the loss calculation is incorrect. I have changed the offset before, and the loss is not 0, but it will decrease abnormally. So I think it may be due to these two aspects

can you provide train script and hardcode samples