gemma-7b-it聊天失败

MrFengJian commented 9 months ago

使用最新的代码，跑gemma-7b-it的聊天例子，结尾重复出现<eos>。命令行 python cli_demo.py --model_name_or_path google/gemma-7b-it --template gemma

hiyouga commented 9 months ago

更新代码重试

kostum123 commented 9 months ago

Gemma series models currently have severe issues. I don't recommend anyone waste their time or resources until Hugging Face and Google fix the implementation, and the models can be trained properly. Specially 7b one. https://github.com/huggingface/transformers/issues/29250

hiyouga commented 9 months ago

@kostum123 currently gemma can be trained (in bf16 mode) and inferred with llama-factory:

python src/cli_demo.py --model_name_or_path google/gemma-7b-it --template gemma

kostum123 commented 9 months ago

@kostum123 currently gemma can be trained (in bf16 mode) and inferred with llama-factory:

python src/cli_demo.py --model_name_or_path google/gemma-7b-it --template gemma

Yes, it can be trained, but that doesn't change the result. Compared to the original Keras implementation, using the same dataset and settings, Hugging Face's training results in worse loss and outputs. Also, the perplexity of gemma models before training is very high. There might be a bug in their implementation or in the model weight conversion script. Currently, I don't recommend anyone waste computing resources until they fix it. I love Llama Factory, and this issue is not because of Llama Factory; it arises from Google and Hugging Face.

ArthurZucker commented 9 months ago

Thanks for the feedback. As mentioned here perplexity depends a lot on how you compute it. The model is vert sensible to whether or not you pass a bos_token. Also the model was trained in bfloat16, the weights shared are in bfloat16 so they should not be used for float16 training. There is a revision="float16" that was converted from the original float32 and should be used in that case

kostum123 commented 9 months ago

Thanks for the feedback. As mentioned here, perplexity depends a lot on how you compute it. The model is very sensitive to whether or not you pass a bos_token. Also, the model was trained in bfloat16, and the weights shared are in bfloat16 so they should not be used for float16 training. There is a revision="float16" that was converted from the original float32 and should be used in that case.

Thank you for addressing perplexity issue. The real issue lies with instruction tuning while using Hugging Face’s bf16 weights and trainer. I recommend that you train the bf16 weights (in .h5 format) using Keras and Jax on Kaggle, without quantizing the model with the LoRa method. Afterward, train the same model with Hugging Face’s TRL on the same instruction dataset with identical settings, and compare the results. You will encounter a model with low loss, stable training, and better outputs as the training progresses, but only when trained on Kaggle with Keras. Even though I can’t pinpoint exactly where the problem originates, there is definitely a significant difference in quality compared to the Hugging Face trainer. I say this as someone who has successfully trained Mistral, Llama, and many other models with Hugging Face’s trainer and weights. I just want this to be fixed.

You need to adjust the notebook for instruction tuning also. The first cell must be in order like this:

!pip install -q -U keras-nlp
# Work around an import error with tensorflow-hub. The library is not used.
!pip install -q -U tensorflow-hub
# Install tensorflow-cpu so tensorflow does not attempt to access the TPU.
!pip install -q -U tensorflow-cpu
# Install keras 3 last. See https://keras.io/getting_started for details.
!pip install -q -U keras

For more details, see this Kaggle notebook: Keras Gemma: Distributed Finetuning and Inference.

ArthurZucker commented 9 months ago

I think the training issues could be related to https://github.com/huggingface/transformers/pull/29285. Not 100% certain but if autocast was used then this will fix it

hiyouga / LLaMA-Factory

gemma-7b-it聊天失败 #2540