Open KillerShoaib opened 6 months ago
Did you consider using the llama3 chat template instead of the default one (check this notebook) ?
Alternatively you could use tools like guidance
which offers a lot of options to stop generation (for example regex or substrings). However, you will need to convert your model to llama.cpp to use with guidance
. You loose unsloth
's inference speed up but you can run on cpu.
I encountered the same problem. I added EOS to the training data, but during prediction, the output always continues to the maximum number of tokens.
I'm facing the same problem here.
I've figured out the solution. Below is the code for those who just want the solution, not the details:
# change the padding tokenizer value
tokenizer.add_special_tokens({"pad_token": "<|reserved_special_token_0|>"})
model.config.pad_token_id = tokenizer.pad_token_id # updating model config
tokenizer.padding_side = 'right' # padding to right (otherwise SFTTrainer shows warning)
Now, pass the model
and tokenizer
to SFTTrainer
.
pad_token_id
& eos_token_id
is the same. Therefore when the model is fine-tuning the loss function ignores both pad_token
and eos_token
. Thus, the model is not learning to predict the eos_token
.pad_token_id
and eos_token_id
for the unsloth-llama3. I found both the pad_token_id
and the eos_token_id
are the same.
print(f"Pad Token id: {tokenizer.pad_token_id} and Pad Token: {tokenizer.pad_token}")
print(f"EOS Token id: {tokenizer.eos_token_id} and EOS Token: {tokenizer.eos_token}")
>>> Pad Token id: 128001 and Pad Token: <|end_of_text|>
>>> EOS Token id: 128001 and EOS Token: <|end_of_text|>
pad_token_id
. I've found this stack overflow question where it shows how to change the pad_token_id
for falcon model.pad_token_id
you can not add any random value. It'll throw CUDA
error. (I'm not sure but I'm assuming the reason for that error is the mismatch between tokenizer vocab size and model vocab size.)<|reserved_special_token_0|>
to <|reserved_special_token_250|>
. You can use any of the reserved special token value as the pad_token
value. I've used the first one <|reserved_special_token_0|>
pad_token_id
and eos_token_id
values.
print(f"Pad Token id: {tokenizer.pad_token_id} and Pad Token: {tokenizer.pad_token}")
print(f"EOS Token id: {tokenizer.eos_token_id} and EOS Token: {tokenizer.eos_token}")
>>> Pad Token id: 128002 and Pad Token: <|reserved_special_token_0|>
>>> EOS Token id: 128001 and EOS Token: <|end_of_text|>
pad_token_id
. This time I ask the model same question as before and the model was able to generate eos_token
and stopped before hitting the max_new_tokens
length. Below I've shown 2 pictures showcasing the model's response for the same and different eos_token
and pad_token
.
I'm hopping UnslothAI is going to see this bug and solve it in their colab notebook. Lots of people are facing this issue .
@KillerShoaib WHOOPS you are entirely correct!!!! I immediately updated all pad_token
s Unsloth has to <|reserved_special_token_250|>
Thanks for the keen eye!!
OMG Thank you for the solution here, was driving me nuts why llama3 was getting more rambling the more i trained it.
I suggest using <|end_of_text|> for pad token and <|eot_id|> for eos token.
This issue still seems to be ongoing when using the default meta-llama models (e.g., meta-llama/Meta-Llama-3-8B-Instruct) following the Colab notebook example here: https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing
I can confirm the issue is not present however using the exact same code with the unsloth version of the model (e.g., unsloth/llama-3-8b-Instruct-bnb-4bit).
Long story short, it seems like there still some issue with eos for the stock llama 3 models.
@davedgd Oh so Unsloth is fine (the models or just finetuning with Unsloth?) but the Meta ones still don't work as expected?
@davedgd Oh so Unsloth is fine (the models or just finetuning with Unsloth?) but the Meta ones still don't work as expected?
Correct, but to clarify, no issues when tuning the Unsloth provided model with Unsloth, but I have issues with the exact same code when using the meta-llama repository version of Llama 3 8B Instruct.
With the meta-llama fine-tune only (using the Llama 3 8B Instruct Colab notebook and swapping out for my fine tuning data), the fine tuning goes great but the inferencing will hang for a while and run until max length, producing results like this:
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have significantly reduced over time
Vehicle emissions have
PS. Thanks for the quick reply! :)
@davedgd Oh that's a shame for Meta's official repo - well glad Unsloth works fine :)
@davedgd Oh so Unsloth is fine (the models or just finetuning with Unsloth?) but the Meta ones still don't work as expected?
Correct, but to clarify, no issues when tuning the Unsloth provided model with Unsloth, but I have issues with the exact same code when using the meta-llama repository version of Llama 3 8B Instruct.
With the meta-llama fine-tune only (using the Llama 3 8B Instruct Colab notebook and swapping out for my fine tuning data), the fine tuning goes great but the inferencing will hang for a while and run until max length, producing results like this:
Hi, I am seeing some odd differences in unsloth/llama-3-8b-Instruct vs Meta-Llama-3-8B-Instruct (official hf one) with respect to the tokenizer and other .json files.
I guess I'm still confused ...
Does anyone know why unsloth renamed (and apparently renumbered) the eos_token from 128001 to 128009 and
"eos_token": "<|end_of_text|>"
to "eos_token": "<|eot_id|>"
and then changed the pad_id from -1 to 128255?
So is the unsloth update to the hf model for llama3 some kind of a bug fix for the original llama3 hf model?
*** /home/ai/LLM/models/Meta-Llama-3-8B-Instruct/config.json Thu May 9 18:46:47 2024
--- /data/hf/cache/hub/models--unsloth--llama-3-8b-Instruct/snapshots/f77838872cca586fcbafa67efc77fb7d3afe775d/config.json Sat Jun 1 13:49:31 2024
***************
*** 1,11 ****
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
! "eos_token_id": 128001,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
--- 1,12 ----
{
+ "_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
! "eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
***************
*** 21,27 ****
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
! "transformers_version": "4.40.0.dev0",
"use_cache": true,
"vocab_size": 128256
}
--- 22,28 ----
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
! "transformers_version": "4.38.2",
"use_cache": true,
"vocab_size": 128256
}
*** /home/ai/LLM/models/Meta-Llama-3-8B-Instruct/generation_config.json Thu May 9 18:46:47 2024
--- /data/hf/cache/hub/models--unsloth--llama-3-8b-Instruct/snapshots/f77838872cca586fcbafa67efc77fb7d3afe775d/generation_config.json Sat Jun 1 13:49:31 2024
***************
*** 1,9 ****
{
"bos_token_id": 128000,
"eos_token_id": [128001, 128009],
! "do_sample": true,
! "temperature": 0.6,
! "max_length": 4096,
! "top_p": 0.9,
! "transformers_version": "4.40.0.dev0"
}
--- 1,6 ----
{
+ "_from_model_config": true,
"bos_token_id": 128000,
"eos_token_id": [128001, 128009],
! "transformers_version": "4.38.2"
}
*** /home/ai/LLM/models/Meta-Llama-3-8B-Instruct/special_tokens_map.json Thu May 9 18:46:48 2024
--- /data/hf/cache/hub/models--unsloth--llama-3-8b-Instruct/snapshots/f77838872cca586fcbafa67efc77fb7d3afe775d/special_tokens_map.json Sat Jun 1 13:49:31 2024
***************
*** 1,4 ****
{
! "bos_token": "<|begin_of_text|>",
! "eos_token": "<|end_of_text|>"
}
--- 1,23 ----
{
! "bos_token": {
! "content": "<|begin_of_text|>",
! "lstrip": false,
! "normalized": false,
! "rstrip": false,
! "single_word": false
! },
! "eos_token": {
! "content": "<|eot_id|>",
! "lstrip": false,
! "normalized": false,
! "rstrip": false,
! "single_word": false
! },
! "pad_token": {
! "content": "<|reserved_special_token_250|>",
! "lstrip": false,
! "normalized": false,
! "rstrip": false,
! "single_word": false
! }
}
*** /home/ai/LLM/models/Meta-Llama-3-8B-Instruct/tokenizer_config.json Thu May 9 18:46:48 2024
--- /data/hf/cache/hub/models--unsloth--llama-3-8b-Instruct/snapshots/f77838872cca586fcbafa67efc77fb7d3afe775d/tokenizer_config.json Sat Jun 1 13:49:31 2024
***************
*** 2052,2062 ****
"bos_token": "<|begin_of_text|>",
"chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
"clean_up_tokenization_spaces": true,
! "eos_token": "<|end_of_text|>",
"model_input_names": [
"input_ids",
"attention_mask"
],
"model_max_length": 1000000000000000019884624838656,
"tokenizer_class": "PreTrainedTokenizerFast"
}
--- 2052,2064 ----
"bos_token": "<|begin_of_text|>",
"chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
"clean_up_tokenization_spaces": true,
! "eos_token": "<|eot_id|>",
"model_input_names": [
"input_ids",
"attention_mask"
],
"model_max_length": 1000000000000000019884624838656,
+ "pad_token": "<|reserved_special_token_250|>",
+ "padding_side": "left",
"tokenizer_class": "PreTrainedTokenizerFast"
}
Yes the pad token is in fact a bug fix
Yes the pad token is in fact a bug fix
Indeed. My pull of the official Llama3 hf models occurred more than 20 days ago :-) Thank you.
Oh cool! Ye it got updated
@davedgd Oh so Unsloth is fine (the models or just finetuning with Unsloth?) but the Meta ones still don't work as expected?
Correct, but to clarify, no issues when tuning the Unsloth provided model with Unsloth, but I have issues with the exact same code when using the meta-llama repository version of Llama 3 8B Instruct.
With the meta-llama fine-tune only (using the Llama 3 8B Instruct Colab notebook and swapping out for my fine tuning data), the fine tuning goes great but the inferencing will hang for a while and run until max length, producing results like this:
Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have significantly reduced over time Vehicle emissions have
PS. Thanks for the quick reply! :)
Same issue while i am testing unsloth llama3.1
@Dineshkumar-Anandan-ZS0367 Oh much apologies on the horrible late reply - hopefully it functions now?
I've fine-tuned the llama 3 8 billion model. I followed the notebook and only changed the dataset. The dataset is similar to the alpaca dataset but for the Bangla language. I've trained the model for
1 epoch
(36hrs) on a single T4 GPU. But, when I'm trying to generate a response it is not generating anyeos
token. It will go on till hitting themax_new_token
length and stop.Here is a sample of the code that is creating the dataset. (The same as the colab notebook. Just change the dataset name and system prompt)
code:
One single example of the
dataset['text']
looks like this:'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nপদার্থের পরিবর্তনশীলতা এর বৈজ্ঞানিক সংজ্ঞা কি?\n\n### Input:\n\n\n### Response:\nবিপাক একটি জীবের মধ্যে ঘটে যাওয়া সমস্ত জৈব রাসায়নিক বিক্রিয়াকে বোঝায়, যার মধ্যে এমন প্রতিক্রিয়া রয়েছে যা শক্তি উত্পাদন করতে অণু ভাঙ্গতে পারে (ক্যাটাবলিজম) এবং নতুন অণু তৈরি করে (অ্যানাবলিজম) । এই প্রতিক্রিয়াগুলি এনজাইম দ্বারা সহজতর হয় এবং বৃদ্ধি, প্রজনন এবং পরিবেশের প্রতিক্রিয়া হিসাবে প্রয়োজনীয় প্রক্রিয়াগুলির মাধ্যমে জীবন বজায় রাখার জন্য প্রয়োজনীয়। বিপাক বিশেষত খাদ্যের ভাঙ্গন এবং এটি শক্তিতে রূপান্তরিত হতে পারে।<|end_of_text|>'
The
EOS
token has been added to the text in the endHere is the generation code (same as the notebook):
Here is the response output :
['<|begin_of_text|>Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.\n\n### Instruction:\nসুস্থ থাকার তিনটি উপায় বলুন\n\n### Input:\n\n\n### Response:\n১. নিয়মিত ব্যায়াম করুন: নিয়মিত শারীরিক ক্রিয়াকলাপ করা আপনার শরীরের স্বাস্থ্য এবং সুস্থতা বজায় রাখতে সহায়তা করতে পারে। এটি হার্ট রোগ, ডায়াবেটিস এবং স্থূলতার মতো দীর্ঘস্থায়ী রোগের ঝুঁকি হ্রাস করতে পারে। ২. স্বাস্থ্যকর খাদ্য খানঃ একটি সুষম এবং পুষ্টিকর ডায়েট খাওয়া আপনার শরীরের স্বাস্থ্য এবং সুস্থতা বজায় রাখতে সহায়তা করতে পারে। ফল, সবজি, পূর্ণ শস্য, চর্বিযুক্ত প্রোটিন এবং স্বাস্থ্যকর ফ্যাট সহ একটি ভারসাম্যপূর্ণ ডায়েট খাওয়া আপনার শরীরকে সঠিকভাবে কাজ করতে সহায়তা করতে পারে। ৩. পর্যাপ্ত ঘুম পানঃ পর্যাপ্ত ঘুম পাওয়া আপনার শরীরের স্বাস্থ্য এবং সুস্থতা বজায় রাখতে গুরুত্বপূর্ণ। প্রতি রাতে কমপক্ষে 7-8 ঘন্টা ঘুম পাওয়া আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। ঘুমের অভাব আপনার ইমিউন সিস্টেমকে দুর্বল করতে পারে, রোগের ঝুঁকি বাড়িয়ে তুলতে পারে এবং আপনার মানসিক স্বাস্থ্যের উপর নেতিবাচক প্রভাব ফেলতে পারে। সুতরাং পর্যাপ্ত ঘুম পাওয়া আপনার সামগ্রিক স্বাস্থ্য এবং সুস্থতা বজায় রাখতে গুরুত্বপূর্ণ। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য। এটি আপনার শরী']
I asked the model in Bangla "Tell me 3 ways I can be healthy" and the model generated a coherent response. But after finishing the response it starts spamming "এটি আপনার শরীরের স্বাস্থ্যের জন্য অপরিহার্য" (eng-translation: It is necessary for your body). And it goes till it hits the max_new_token length. I've tried different questions, but the result is always the same. I couldn't find a single time where the model generated the
eos
token.The
EOS
token has been added to thedata['text']
. So in theory, If I fine-tune the model then it should learn to predict theEOS
token. I've a total 51k samples and finetuned the model for1 epoch
.One thing I've noticed is that in the original colab notebook, when the model was trained for 60 iterations and used to generate a response none of the responses generated
EOS
token.